Skip to main content
Log in

A method to increase the number of positive samples for machine learning-based urban waterlogging susceptibility assessments

  • Original Paper
  • Published:
Stochastic Environmental Research and Risk Assessment Aims and scope Submit manuscript

Abstract

The frequent occurrence of urban waterlogging seriously affects people’s lives and the national economy. The use of machine learning (ML) methods to spatially assess urban waterlogging susceptibility is critical for reducing the losses caused by such disasters. It is important to select an equal number of positive and negative samples to train binary ML classifiers for evaluation; in most cases, researchers are only able to obtain a relatively small number of historical waterlogging locations (positive samples), which leads to the selection of a limited number of negative samples, further affecting the trained classifiers’ performance. Facing this issue, we proposed an optimized seed spread algorithm (OSSA) that can estimate the potential inundated areas based on the spatial distribution of elevation and natural waters, thereby increasing the number of positive samples. The primary urban area of Guangzhou, China, was selected as the study region, and random forest was selected as the evaluation algorithm. We further employed two ML methods, support vector machine and logistic regression, to verify the quality of the increased positive samples. The results indicate that compared with the original positive samples, the OSSA-based positive samples achieve the highest area under the curve values among the three tested ML methods, indicating that the OSSA can be a suitable approach to increase the number of positive samples for such studies. We believe that this study advances the ML-based waterlogging susceptibility assessments, which could be valuable for developing countries where intensive hydrologic monitoring is lacking.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Data availability

The data of this study is available at the permanent link: https://figshare.com/articles/dataset/codes_data/13286099.

Code availability

The codes of RF, a source which is opening to public, are available in GitHub at https://github.com/belindee/RandomForcastToiletries. The codes of LR are built-in functions in MATLAB 2017a. The codes of SVM and PSO are available in GitHub at: https://github.com/faruto/Libsvm-FarutoUltimate-Version. The codes of OSSA, developed in this study, are available at the permanent link: https://figshare.com/articles/dataset/codes_data/13286099.

References

  • Althuwaynee OF, Pradhan B, Lee S (2012) Application of an evidential belief function model in landslide susceptibility mapping. Comput Geosci-UK 44:120–135

    Article  Google Scholar 

  • Arpaci A, Malowerschnig B, Sass O, Vacik H (2014) Using multi variate data mining techniques for estimating fire susceptibility of Tyrolean forests. Appl Geogr 53:258–270

    Article  Google Scholar 

  • Belgiu M, Drăguţ L (2016) Random forest in remote sensing: a review of applications and future directions. Isprs J Photogramm 114:24–31

    Article  Google Scholar 

  • Bharadwaj A, Dahiya S, Jain R (2012) Discretization based Support Vector Machine (D-SVM) for classification of agricultural data sets. Int J Comput Appl T 40(1):8–12

    Google Scholar 

  • Breiman L (2001) Random forests. Mach Learn 45(1):5–32

    Article  Google Scholar 

  • Brownlee J (2019) A gentle introduction to imbalanced classification. Available at https://machinelearningmastery.com/what-is-imbalanced-classification/. Accessed 6 January 2021

  • Cao F, Ge Y, Wang J (2013) Optimal discretization for geographical detectors-based risk assessment. Gisci Risci Remte Sens 50(1):78–92

    Article  Google Scholar 

  • Cao F, Ge Y, Wang J (2014) Spatial data discretization methods for geocomputation. Int J Appl Earth Obs Geoinf 26:432–440

    Google Scholar 

  • Chapi K, Singh VP, Shirzadi A, Shahabi H, Bui DT, Pham BT, Khosravi K (2017) A novel hybrid artificial intelligence approach for flood susceptibility assessment. Environ Model Softw 95:229–245

    Article  Google Scholar 

  • Chen AS, Evans B, Djordjević S, Savić DA (2012) Multi–layered coarse grid modelling in 2D urban flood simulations. J Hydrol 470–471:1–11

    Google Scholar 

  • Chen W, Hong H, Li S, Shahabi H, Wang Y, Wang X, Ahmad BB (2019) Flood susceptibility modelling using novel hybrid approach of reduced-error pruning trees with bagging and random subspace ensembles. J Hydrol 575:864–873

    Article  Google Scholar 

  • Chen W, Li Y, Xue W, Shahabi H, Li S, Hong H, Wang X, Bian H, Zhang S, Prandhan B, Ahmad BB (2020) Modeling flood susceptibility using data-driven approaches of naïve bayes tree, alternating decision tree, and random forest methods. Sci Total Environ 701:134979

    Article  CAS  Google Scholar 

  • Chen W, Pourghasemi HR, Kornejady A, Zhang N (2017) Landslide spatial modeling: Introducing new ensembles of ANN, MaxEnt, and SVM machine learning techniques. Geoderma 305:314–327

    Article  Google Scholar 

  • Chen Y, Liu R, Barrett D, Gao L, Zhou M, Renzullo L, Emelyanova I (2015) A spatial assessment framework for evaluating flood risk under extreme climates. Sci Total Environ 538:512–523

    Article  CAS  Google Scholar 

  • Costache R, Hong H, Wang Y (2019) Identification of torrential valleys using GIS and a novel hybrid integration of artificial intelligence, machine learning and bivariate statistics. CATENA 183:104179

    Article  Google Scholar 

  • Ding J, Cai J, Guo G, Chen C (2018) An emergency decision-making method for urban rainstorm water-logging: a China study. Sustainability 10(10):3453

    Article  Google Scholar 

  • Elkhrachy I (2015) Flash flood hazard mapping using satellite images and GIS tools: A case study of Najran City, Kingdom of Saudi Arabia (KSA). Egypt J Remote Sens Space Sci 18(2):261–278

    Google Scholar 

  • ESRI (2017) ArcGIS 10.3 help

  • Fernández A, Garcia S, Herrera F, Chawla NV (2018) SMOTE for learning from imbalanced data: progress and challenges, marking the 15-year anniversary. J Artif Intell Res 61:863–905

    Article  Google Scholar 

  • Fu Y, Li J, Weng Q, Zheng Q, Li L, Dai S, Guo B (2019) Characterizing the spatial pattern of annual urban growth by using time series Landsat imagery. Sci Total Environ 666:274–284

    Article  CAS  Google Scholar 

  • Gislason PO, Benediktsson JA, Sveinsson JR (2006) Random forests for land cover classification. Pattern Recogn Lett 27(4):294–300

    Article  Google Scholar 

  • Hoehler FK (2000) Bias and prevalence effects on kappa viewed in terms of sensitivity and specificity. J Clin Epiidemiol 53(5):499–503

    Article  CAS  Google Scholar 

  • Hong H, Miao Y, Liu J, Zhu AX (2019) Exploring the effects of the design and quantity of absence data on the performance of random forest-based landslide susceptibility mapping. CATENA 176:45–64

    Article  Google Scholar 

  • Hong H, Tsangaratos P, Ilia I, Liu J, Zhu A, Chen W (2018) Application of fuzzy weight of evidence and data mining techniques in construction of flood susceptibility map of Poyang County, China. Sci Total Environ 625:575–588

    Article  CAS  Google Scholar 

  • Huang H, Chen X, Zhu Z, Xie Y, Liu L, Wang X, Wang X, Liu K (2018) The changing pattern of urban flooding in Guangzhou, China. Sci Total Environ 622:394–401

    Article  Google Scholar 

  • Islam ARMT, Talukdar S, Mahato S, Kundu S, Eibek KU, Pham QB, Kuriqi A, Linh NTT (2020) Flood susceptibility modelling using advanced ensemble machine learning models. Geosci Front 12(3):101075

    Article  Google Scholar 

  • Jia L (2019) Mapping of landslide susceptibility based on GIS in Yongjin County of Gansu Province. Dissertation. Lanzhou University. (in Chinese)

  • Khosravi K, Shahabi H, Pham BT, Adamowski J, Shirzadi A, Pradhan B, Dou J, Ly H, Ho HL, Hong H, Chapi K, Prakash I (2019) A comparative assessment of flood susceptibility modeling using multi-criteria decision-making analysis and machine learning methods. J Hydrol 573:311–323

    Article  Google Scholar 

  • Köylü Ü, Geymen A (2016) GIS and remote sensing techniques for the assessment of the impact of land use change on runoff. Arab J Geosci 9:484

    Article  Google Scholar 

  • Leuenberger M, Parente J, Tonini M, Pereira MG, Kanevski M (2018) Wildfire susceptibility mapping: deterministic vs. stochastic approaches. Environ Model Softw 101:194–203

    Article  Google Scholar 

  • Li B, Zhao Y, Fu Y (2015) Spatio–temporal characteristics of urban storm waterlogging in Guangzhou and the impact of urban growth. Earth Information Sci 17(4):445–450 ((in Chinese))

    CAS  Google Scholar 

  • Li Y, Chen J, Tan C, Li Y, Gu F, Zhang Y, Mehmood Q (2020) Application of the borderline-SMOTE method in susceptibility assessments of debris flows in Pinggu District, Beijing, China. Nat Hazards 2499–2522

  • Liang T (2017) Flood vulnerability analysis for inland medium-sized cities: Guang’an as an example. Dissertation. Kungliga Tekniska Högskolan

  • Liang X, Jiang A, Li T, Xue Y, Wang G (2020) LR-SMOTE–An improved unbalanced data set oversampling based on K-means and SVM. Knowl Based Syst 196:105845

    Article  Google Scholar 

  • Liu R, Chen Y, Wu J, Gao L, Barrett D, Xu T, Li X, Li L, Huang C, Yu J (2017) Integrating entropy-based Naive Bayes and Gis for spatial evaluation of flood hazard. Risk Anal 37(4):756–773

    Article  Google Scholar 

  • Liu R, Liu N (2001) A GIS-based method for flooded area calculation and damage evaluation. J Geogr Sci 11(2):187–192

    Article  Google Scholar 

  • Liu R, Liu N (2002) Flood area and damage estimation in Zhejiang. China J Environ Manag 66(1):1–8

    Google Scholar 

  • Oliveira S, Oehler F, San-Miguel-Ayanz J, Camia A, Pereira JM (2012) Modeling spatial patterns of fire occurrence in Mediterranean Europe using multiple regression and random forest. For Ecol Manag 275:117–129

    Article  Google Scholar 

  • Quan R, Liu M, Lu M, Zhang L, Wang J, Xu S (2010) Waterlogging risk assessment based on land use/cover change: a case study in Pudong New Area, Shanghai. Environ Earth Sci 61:1113–1121

    Article  Google Scholar 

  • Rahmati O, Darabi H, Panahi M, Kalantari Z, Naghibi SA, Ferreira CSS, Kornejady A, Karimidastenaei Z, Mohammadi F, Stefanidis S, Tien Bui D, Haghighi AT (2020) Development of novel hybridized models for urban flood susceptibility mapping. Sci Rep 10(1):1–19

    Article  Google Scholar 

  • Ramesh V, Iqbal SS (2020) Urban flood susceptibility zonation mapping using evidential belief function, frequency ratio and fuzzy gamma operator models in GIS: a case study of Greater Mumbai, Maharashtra, India. Geocarto Int 1–26

  • Reichenbach P, Rossi M, Malamud BD, Mihir M, Guzzetti F (2018) A review of statistically-based landslide susceptibility models. Earth-Sci Rev 180:60–91

    Article  Google Scholar 

  • Saleh A, Yuzir A, Abustan I (2020) Flash flood susceptibility modelling: a review. In IOP Conference Series: Materials Science and Engineering. IOP Publishing

  • Singh SK, Pandey AC (2014) Geomorphology and the controls of geohydrology on waterlogging in Gangetic Plains, North Bihar. India Environ Earth Sci 71(4):1561–1579

    Article  Google Scholar 

  • Sun D, Wen H, Wang D, Xu J (2020) A random forest model of landslide susceptibility mapping based on hyperparameter optimization using Bayes algorithm. Geomorphology 362:107201

    Article  Google Scholar 

  • Tang X, Hong H, Shu Y, Tang H, Li J, Liu W (2019) Urban waterlogging susceptibility assessment based on a PSO-SVM method using a novel repeatedly random sampling idea to select negative samples. J Hydrol 576:583–595

    Article  Google Scholar 

  • Tang X, Li J, Liu M, Liu W, Hong H (2020a) Flood susceptibility assessment based on a novel random Naïve Bayes method: a comparison between different factor discretization methods. CATENA 190:104536

    Article  Google Scholar 

  • Tang X, Machimura T, Li J, Liu W, Hong H (2020b) A novel optimized repeatedly random undersampling for selecting negative samples: a case study in an SVM-based forest fire susceptibility assessment. J Environ Manag 271:111014

    Article  Google Scholar 

  • Tang X, Shu Y, Lian Y, Zhao Y, Fu Y (2018) A spatial assessment of urban waterlogging risk based on a weighted Naïve Bayes classifier. Sci Total Environ 630:264–274

    Article  CAS  Google Scholar 

  • Tao T, Wang J, Xin K, Li S (2014) Multi-objective optimal layout of distributed storm-water detention. Int J Environ Sci Technol 11(5):1473–1480

    Article  Google Scholar 

  • Tehrany MS, Pradhan B, Jebur MN (2013) Spatial prediction of flood susceptible areas using rule based decision tree (DT) and a novel ensemble bivariate and multivariate statistical models in GIS. J Hydrol 504:69–79

    Article  Google Scholar 

  • Tehrany MS, Pradhan B, Jebur MN (2014) Flood susceptibility mapping using a novel ensemble weights-of-evidence and support vector machine models in GIS. J Hydrol 512:332–343

    Article  Google Scholar 

  • Tehrany MS, Jones S, Shabani F, Martínez-Álvarez F, Bui DT (2019) A novel ensemble modeling approach for the spatial prediction of tropical forest fire susceptibility using logitboost machine learning classifier and multi-source geospatial data. Theor Appl Climatol 137(1–2):637–653

    Article  Google Scholar 

  • Tien Bui D, Bui QT, Nguyen QP, Pradhan B, Nampak H, Trinh PT (2017) A hybrid artificial intelligence approach using GIS-based neural-fuzzy inference system and particle swarm optimization for forest fire susceptibility modeling at a tropical area. Agr Forest Meterol 233:32–44

    Article  Google Scholar 

  • Tien Bui D, Lofman O, Revhaug I, Dick O (2011) Landslide susceptibility analysis in the Hoa Binh province of Vietnam using statistical index and logistic regression. Nat Hazards 9:1413–1444

    Google Scholar 

  • Tonini M, D’Andrea M, Biondi G, Degli Esposti S, Trucchia A, Fiorucci P (2020) A machine learning-based approach for wildfire susceptibility mapping the case study of the Liguria region in Italy. Geosciences 10(3):105

    Article  Google Scholar 

  • Wang S, Fu B, Gao G, Liu Y, Zhou J (2013) Responses of soil moisture in different land cover types to rainfall events in a re-vegetation catchment area of the Loess Plateau, China. CATENA 101:122–128

    Article  Google Scholar 

  • Wang Z, Lai C, Chen X, Yang B, Zhao S, Bai X (2015) Flood hazard risk assessment model based on random forest. J Hydrol 527:1130–1141

    Article  Google Scholar 

  • Wu S (2013) Research on reasons of urban rainstorm waterlogging and the technology of flood utilization in Guangzhou city Dissertation South China University of Technology. (in Chinese)

  • Xu J, Zhao Y, Zhong K, Zhang F, Liu X, Sun C (2018) Measuring spatio-temporal dynamics of impervious surface in Guangzhou, China, from 1988 to 2015, using time-series Landsat imagery. Sci Total Environ 627:264–281

    Article  CAS  Google Scholar 

  • Yang W, Jiang X (2020) Evaluating forest fire probability under the influence of human activity based on remote sensing and GIS. Nat Hazards Earth Syst Sci Discuss 1–16

  • Yin J, Ye M, Yin Z, Xu S (2015) A review of advances in urban flood risk analysis over China. Stoch Env Res Risk A 29:1063–1070

    Article  Google Scholar 

  • Yu H, Zhao Y, Fu Y (2019) Optimization of impervious surface space layout for prevention of urban rainstorm waterlogging: a case study of Guangzhou. China Int J Public Health 16(19):3613

    Article  Google Scholar 

  • Yu H, Zhao Y, Fu Y, Li L (2018) Spatiotemporal variance assessment of urban rainstorm waterlogging affected by impervious surface expansion: a case study of Guangzhou. China Sustain 10(10):3761

    Article  Google Scholar 

  • Zaniewski AE, Lehmann A, Overton JM (2002) Predicting species spatial distributions using presence-only data: a case study of native New Zealand ferns. Ecol Model 157:261–280

    Article  Google Scholar 

  • Zhang H, Cheng J, Wu Z, Li C, Qin J, Liu T (2018) Effects of impervious surface on the spatial distribution of urban waterlogging risk spots at multiple scales in Guangzhou. South China Sustain 10(5):1589

    Google Scholar 

  • Zhang S, Pan B (2014) An urban storm–inundation simulation method based on GIS. J Hydrol 517:260–268

    Article  Google Scholar 

  • Zhang Y, Zhang H, Lin H (2014) Improving the impervious surface estimation with combined use of optical and SAR remote sensing images. Remote Sens Environ 141:155–167

    Article  Google Scholar 

  • Zhao G, Pang B, Xu Z, Yue J, Tu T (2018) Mapping flood susceptibility in mountainous areas on a national scale in China. Sci Total Environ 615:1133–1142

    Article  CAS  Google Scholar 

  • Zi H (2013) Research on emergency disposal of Guangzhou City waterlogging: a case study of torrential rain on “2010. 5. 7” Dissertation South China University of Technology (in Chinese)

Download references

Acknowledgements

We express our sincere thanks to Dr. George Christakos, SI Guest editor, and three anonymous reviewers, as their valuable comments and suggestions greatly improved the quality of the paper. We express our sincere thanks to Guibing Zheng, for his assistance related to data collection during preparation of this manuscript. We thank American Journal Experts for its linguistic assistance during preparation of this manuscript.

Author information

Authors and Affiliations

Authors

Contributions

Xianzhe Tang: Software, Writing—original draft, Validation, Methodology, Conceptualization, Formal analysis. Jiufeng Li: Validation, Writing—review & editing, Formal analysis. Wei Liu: Resources, Supervision, Validation, Writing—review & editing, Formal analysis. Huafei Yu: Writing—review & editing. Fangfang Wang: Writing—review & editing.

Corresponding author

Correspondence to Wei Liu.

Ethics declarations

Conflicts of interest

No potential conflicts of interest were reported by the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tang, X., Li, J., Liu, W. et al. A method to increase the number of positive samples for machine learning-based urban waterlogging susceptibility assessments. Stoch Environ Res Risk Assess 36, 2319–2336 (2022). https://doi.org/10.1007/s00477-021-02035-8

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00477-021-02035-8

Keywords

Navigation