Abstract
The frequent occurrence of urban waterlogging seriously affects people’s lives and the national economy. The use of machine learning (ML) methods to spatially assess urban waterlogging susceptibility is critical for reducing the losses caused by such disasters. It is important to select an equal number of positive and negative samples to train binary ML classifiers for evaluation; in most cases, researchers are only able to obtain a relatively small number of historical waterlogging locations (positive samples), which leads to the selection of a limited number of negative samples, further affecting the trained classifiers’ performance. Facing this issue, we proposed an optimized seed spread algorithm (OSSA) that can estimate the potential inundated areas based on the spatial distribution of elevation and natural waters, thereby increasing the number of positive samples. The primary urban area of Guangzhou, China, was selected as the study region, and random forest was selected as the evaluation algorithm. We further employed two ML methods, support vector machine and logistic regression, to verify the quality of the increased positive samples. The results indicate that compared with the original positive samples, the OSSA-based positive samples achieve the highest area under the curve values among the three tested ML methods, indicating that the OSSA can be a suitable approach to increase the number of positive samples for such studies. We believe that this study advances the ML-based waterlogging susceptibility assessments, which could be valuable for developing countries where intensive hydrologic monitoring is lacking.
Similar content being viewed by others
Data availability
The data of this study is available at the permanent link: https://figshare.com/articles/dataset/codes_data/13286099.
Code availability
The codes of RF, a source which is opening to public, are available in GitHub at https://github.com/belindee/RandomForcastToiletries. The codes of LR are built-in functions in MATLAB 2017a. The codes of SVM and PSO are available in GitHub at: https://github.com/faruto/Libsvm-FarutoUltimate-Version. The codes of OSSA, developed in this study, are available at the permanent link: https://figshare.com/articles/dataset/codes_data/13286099.
References
Althuwaynee OF, Pradhan B, Lee S (2012) Application of an evidential belief function model in landslide susceptibility mapping. Comput Geosci-UK 44:120–135
Arpaci A, Malowerschnig B, Sass O, Vacik H (2014) Using multi variate data mining techniques for estimating fire susceptibility of Tyrolean forests. Appl Geogr 53:258–270
Belgiu M, Drăguţ L (2016) Random forest in remote sensing: a review of applications and future directions. Isprs J Photogramm 114:24–31
Bharadwaj A, Dahiya S, Jain R (2012) Discretization based Support Vector Machine (D-SVM) for classification of agricultural data sets. Int J Comput Appl T 40(1):8–12
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Brownlee J (2019) A gentle introduction to imbalanced classification. Available at https://machinelearningmastery.com/what-is-imbalanced-classification/. Accessed 6 January 2021
Cao F, Ge Y, Wang J (2013) Optimal discretization for geographical detectors-based risk assessment. Gisci Risci Remte Sens 50(1):78–92
Cao F, Ge Y, Wang J (2014) Spatial data discretization methods for geocomputation. Int J Appl Earth Obs Geoinf 26:432–440
Chapi K, Singh VP, Shirzadi A, Shahabi H, Bui DT, Pham BT, Khosravi K (2017) A novel hybrid artificial intelligence approach for flood susceptibility assessment. Environ Model Softw 95:229–245
Chen AS, Evans B, Djordjević S, Savić DA (2012) Multi–layered coarse grid modelling in 2D urban flood simulations. J Hydrol 470–471:1–11
Chen W, Hong H, Li S, Shahabi H, Wang Y, Wang X, Ahmad BB (2019) Flood susceptibility modelling using novel hybrid approach of reduced-error pruning trees with bagging and random subspace ensembles. J Hydrol 575:864–873
Chen W, Li Y, Xue W, Shahabi H, Li S, Hong H, Wang X, Bian H, Zhang S, Prandhan B, Ahmad BB (2020) Modeling flood susceptibility using data-driven approaches of naïve bayes tree, alternating decision tree, and random forest methods. Sci Total Environ 701:134979
Chen W, Pourghasemi HR, Kornejady A, Zhang N (2017) Landslide spatial modeling: Introducing new ensembles of ANN, MaxEnt, and SVM machine learning techniques. Geoderma 305:314–327
Chen Y, Liu R, Barrett D, Gao L, Zhou M, Renzullo L, Emelyanova I (2015) A spatial assessment framework for evaluating flood risk under extreme climates. Sci Total Environ 538:512–523
Costache R, Hong H, Wang Y (2019) Identification of torrential valleys using GIS and a novel hybrid integration of artificial intelligence, machine learning and bivariate statistics. CATENA 183:104179
Ding J, Cai J, Guo G, Chen C (2018) An emergency decision-making method for urban rainstorm water-logging: a China study. Sustainability 10(10):3453
Elkhrachy I (2015) Flash flood hazard mapping using satellite images and GIS tools: A case study of Najran City, Kingdom of Saudi Arabia (KSA). Egypt J Remote Sens Space Sci 18(2):261–278
ESRI (2017) ArcGIS 10.3 help
Fernández A, Garcia S, Herrera F, Chawla NV (2018) SMOTE for learning from imbalanced data: progress and challenges, marking the 15-year anniversary. J Artif Intell Res 61:863–905
Fu Y, Li J, Weng Q, Zheng Q, Li L, Dai S, Guo B (2019) Characterizing the spatial pattern of annual urban growth by using time series Landsat imagery. Sci Total Environ 666:274–284
Gislason PO, Benediktsson JA, Sveinsson JR (2006) Random forests for land cover classification. Pattern Recogn Lett 27(4):294–300
Hoehler FK (2000) Bias and prevalence effects on kappa viewed in terms of sensitivity and specificity. J Clin Epiidemiol 53(5):499–503
Hong H, Miao Y, Liu J, Zhu AX (2019) Exploring the effects of the design and quantity of absence data on the performance of random forest-based landslide susceptibility mapping. CATENA 176:45–64
Hong H, Tsangaratos P, Ilia I, Liu J, Zhu A, Chen W (2018) Application of fuzzy weight of evidence and data mining techniques in construction of flood susceptibility map of Poyang County, China. Sci Total Environ 625:575–588
Huang H, Chen X, Zhu Z, Xie Y, Liu L, Wang X, Wang X, Liu K (2018) The changing pattern of urban flooding in Guangzhou, China. Sci Total Environ 622:394–401
Islam ARMT, Talukdar S, Mahato S, Kundu S, Eibek KU, Pham QB, Kuriqi A, Linh NTT (2020) Flood susceptibility modelling using advanced ensemble machine learning models. Geosci Front 12(3):101075
Jia L (2019) Mapping of landslide susceptibility based on GIS in Yongjin County of Gansu Province. Dissertation. Lanzhou University. (in Chinese)
Khosravi K, Shahabi H, Pham BT, Adamowski J, Shirzadi A, Pradhan B, Dou J, Ly H, Ho HL, Hong H, Chapi K, Prakash I (2019) A comparative assessment of flood susceptibility modeling using multi-criteria decision-making analysis and machine learning methods. J Hydrol 573:311–323
Köylü Ü, Geymen A (2016) GIS and remote sensing techniques for the assessment of the impact of land use change on runoff. Arab J Geosci 9:484
Leuenberger M, Parente J, Tonini M, Pereira MG, Kanevski M (2018) Wildfire susceptibility mapping: deterministic vs. stochastic approaches. Environ Model Softw 101:194–203
Li B, Zhao Y, Fu Y (2015) Spatio–temporal characteristics of urban storm waterlogging in Guangzhou and the impact of urban growth. Earth Information Sci 17(4):445–450 ((in Chinese))
Li Y, Chen J, Tan C, Li Y, Gu F, Zhang Y, Mehmood Q (2020) Application of the borderline-SMOTE method in susceptibility assessments of debris flows in Pinggu District, Beijing, China. Nat Hazards 2499–2522
Liang T (2017) Flood vulnerability analysis for inland medium-sized cities: Guang’an as an example. Dissertation. Kungliga Tekniska Högskolan
Liang X, Jiang A, Li T, Xue Y, Wang G (2020) LR-SMOTE–An improved unbalanced data set oversampling based on K-means and SVM. Knowl Based Syst 196:105845
Liu R, Chen Y, Wu J, Gao L, Barrett D, Xu T, Li X, Li L, Huang C, Yu J (2017) Integrating entropy-based Naive Bayes and Gis for spatial evaluation of flood hazard. Risk Anal 37(4):756–773
Liu R, Liu N (2001) A GIS-based method for flooded area calculation and damage evaluation. J Geogr Sci 11(2):187–192
Liu R, Liu N (2002) Flood area and damage estimation in Zhejiang. China J Environ Manag 66(1):1–8
Oliveira S, Oehler F, San-Miguel-Ayanz J, Camia A, Pereira JM (2012) Modeling spatial patterns of fire occurrence in Mediterranean Europe using multiple regression and random forest. For Ecol Manag 275:117–129
Quan R, Liu M, Lu M, Zhang L, Wang J, Xu S (2010) Waterlogging risk assessment based on land use/cover change: a case study in Pudong New Area, Shanghai. Environ Earth Sci 61:1113–1121
Rahmati O, Darabi H, Panahi M, Kalantari Z, Naghibi SA, Ferreira CSS, Kornejady A, Karimidastenaei Z, Mohammadi F, Stefanidis S, Tien Bui D, Haghighi AT (2020) Development of novel hybridized models for urban flood susceptibility mapping. Sci Rep 10(1):1–19
Ramesh V, Iqbal SS (2020) Urban flood susceptibility zonation mapping using evidential belief function, frequency ratio and fuzzy gamma operator models in GIS: a case study of Greater Mumbai, Maharashtra, India. Geocarto Int 1–26
Reichenbach P, Rossi M, Malamud BD, Mihir M, Guzzetti F (2018) A review of statistically-based landslide susceptibility models. Earth-Sci Rev 180:60–91
Saleh A, Yuzir A, Abustan I (2020) Flash flood susceptibility modelling: a review. In IOP Conference Series: Materials Science and Engineering. IOP Publishing
Singh SK, Pandey AC (2014) Geomorphology and the controls of geohydrology on waterlogging in Gangetic Plains, North Bihar. India Environ Earth Sci 71(4):1561–1579
Sun D, Wen H, Wang D, Xu J (2020) A random forest model of landslide susceptibility mapping based on hyperparameter optimization using Bayes algorithm. Geomorphology 362:107201
Tang X, Hong H, Shu Y, Tang H, Li J, Liu W (2019) Urban waterlogging susceptibility assessment based on a PSO-SVM method using a novel repeatedly random sampling idea to select negative samples. J Hydrol 576:583–595
Tang X, Li J, Liu M, Liu W, Hong H (2020a) Flood susceptibility assessment based on a novel random Naïve Bayes method: a comparison between different factor discretization methods. CATENA 190:104536
Tang X, Machimura T, Li J, Liu W, Hong H (2020b) A novel optimized repeatedly random undersampling for selecting negative samples: a case study in an SVM-based forest fire susceptibility assessment. J Environ Manag 271:111014
Tang X, Shu Y, Lian Y, Zhao Y, Fu Y (2018) A spatial assessment of urban waterlogging risk based on a weighted Naïve Bayes classifier. Sci Total Environ 630:264–274
Tao T, Wang J, Xin K, Li S (2014) Multi-objective optimal layout of distributed storm-water detention. Int J Environ Sci Technol 11(5):1473–1480
Tehrany MS, Pradhan B, Jebur MN (2013) Spatial prediction of flood susceptible areas using rule based decision tree (DT) and a novel ensemble bivariate and multivariate statistical models in GIS. J Hydrol 504:69–79
Tehrany MS, Pradhan B, Jebur MN (2014) Flood susceptibility mapping using a novel ensemble weights-of-evidence and support vector machine models in GIS. J Hydrol 512:332–343
Tehrany MS, Jones S, Shabani F, Martínez-Álvarez F, Bui DT (2019) A novel ensemble modeling approach for the spatial prediction of tropical forest fire susceptibility using logitboost machine learning classifier and multi-source geospatial data. Theor Appl Climatol 137(1–2):637–653
Tien Bui D, Bui QT, Nguyen QP, Pradhan B, Nampak H, Trinh PT (2017) A hybrid artificial intelligence approach using GIS-based neural-fuzzy inference system and particle swarm optimization for forest fire susceptibility modeling at a tropical area. Agr Forest Meterol 233:32–44
Tien Bui D, Lofman O, Revhaug I, Dick O (2011) Landslide susceptibility analysis in the Hoa Binh province of Vietnam using statistical index and logistic regression. Nat Hazards 9:1413–1444
Tonini M, D’Andrea M, Biondi G, Degli Esposti S, Trucchia A, Fiorucci P (2020) A machine learning-based approach for wildfire susceptibility mapping the case study of the Liguria region in Italy. Geosciences 10(3):105
Wang S, Fu B, Gao G, Liu Y, Zhou J (2013) Responses of soil moisture in different land cover types to rainfall events in a re-vegetation catchment area of the Loess Plateau, China. CATENA 101:122–128
Wang Z, Lai C, Chen X, Yang B, Zhao S, Bai X (2015) Flood hazard risk assessment model based on random forest. J Hydrol 527:1130–1141
Wu S (2013) Research on reasons of urban rainstorm waterlogging and the technology of flood utilization in Guangzhou city Dissertation South China University of Technology. (in Chinese)
Xu J, Zhao Y, Zhong K, Zhang F, Liu X, Sun C (2018) Measuring spatio-temporal dynamics of impervious surface in Guangzhou, China, from 1988 to 2015, using time-series Landsat imagery. Sci Total Environ 627:264–281
Yang W, Jiang X (2020) Evaluating forest fire probability under the influence of human activity based on remote sensing and GIS. Nat Hazards Earth Syst Sci Discuss 1–16
Yin J, Ye M, Yin Z, Xu S (2015) A review of advances in urban flood risk analysis over China. Stoch Env Res Risk A 29:1063–1070
Yu H, Zhao Y, Fu Y (2019) Optimization of impervious surface space layout for prevention of urban rainstorm waterlogging: a case study of Guangzhou. China Int J Public Health 16(19):3613
Yu H, Zhao Y, Fu Y, Li L (2018) Spatiotemporal variance assessment of urban rainstorm waterlogging affected by impervious surface expansion: a case study of Guangzhou. China Sustain 10(10):3761
Zaniewski AE, Lehmann A, Overton JM (2002) Predicting species spatial distributions using presence-only data: a case study of native New Zealand ferns. Ecol Model 157:261–280
Zhang H, Cheng J, Wu Z, Li C, Qin J, Liu T (2018) Effects of impervious surface on the spatial distribution of urban waterlogging risk spots at multiple scales in Guangzhou. South China Sustain 10(5):1589
Zhang S, Pan B (2014) An urban storm–inundation simulation method based on GIS. J Hydrol 517:260–268
Zhang Y, Zhang H, Lin H (2014) Improving the impervious surface estimation with combined use of optical and SAR remote sensing images. Remote Sens Environ 141:155–167
Zhao G, Pang B, Xu Z, Yue J, Tu T (2018) Mapping flood susceptibility in mountainous areas on a national scale in China. Sci Total Environ 615:1133–1142
Zi H (2013) Research on emergency disposal of Guangzhou City waterlogging: a case study of torrential rain on “2010. 5. 7” Dissertation South China University of Technology (in Chinese)
Acknowledgements
We express our sincere thanks to Dr. George Christakos, SI Guest editor, and three anonymous reviewers, as their valuable comments and suggestions greatly improved the quality of the paper. We express our sincere thanks to Guibing Zheng, for his assistance related to data collection during preparation of this manuscript. We thank American Journal Experts for its linguistic assistance during preparation of this manuscript.
Author information
Authors and Affiliations
Contributions
Xianzhe Tang: Software, Writing—original draft, Validation, Methodology, Conceptualization, Formal analysis. Jiufeng Li: Validation, Writing—review & editing, Formal analysis. Wei Liu: Resources, Supervision, Validation, Writing—review & editing, Formal analysis. Huafei Yu: Writing—review & editing. Fangfang Wang: Writing—review & editing.
Corresponding author
Ethics declarations
Conflicts of interest
No potential conflicts of interest were reported by the authors.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Tang, X., Li, J., Liu, W. et al. A method to increase the number of positive samples for machine learning-based urban waterlogging susceptibility assessments. Stoch Environ Res Risk Assess 36, 2319–2336 (2022). https://doi.org/10.1007/s00477-021-02035-8
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00477-021-02035-8