Abstract
Compiling an inventory is a fundamental step for carrying out assessments of landslide hazards. However, data in sufficient quantity and quality are not always available. Thus, this study puts forward an approach for drawing up a landslide inventory using textual data from telephone records, and for mapping hazards of landslides in an urban area. Forty thousand seven hundred ninety-two textual records and the naive Bayes algorithm were used to classify them, and these form the landslide inventory. After creating the inventory, the random forest algorithm with 12 conditioning variables was used to map landslide hazards. The text classification model obtained an accuracy of 0.8671 and a Kappa index of 0.8038. The hazard mapping model obtained accuracy of 0.9503 and an AUC (area under the curve)-ROC (receiver operating characteristics) of 0.9870. The results produced by the model were also compared with real landslides reported in news reports and were shown to be close to what had happened, thus demonstrating the ability of the proposed approach to predict landslides. Finally, the proposed approach can be used in simulation environments, thereby supporting strategic decision-making associated with hazard analysis.
Similar content being viewed by others
References
Abbaszadeh Shahri A, Spross J, Johansson F, Larsson S (2019) Landslide susceptibility hazard map in southwest Sweden using artificial neural network. Catena 183:104225. https://doi.org/10.1016/j.catena.2019.104225
Ada M, San BT (2018) Comparison of machine-learning techniques for landslide susceptibility mapping using two-level random sampling (2LRS) in Alakir catchment area, Antalya, Turkey. Nat Hazards 90:237–263
Adineh F, Motamedvaziri B, Ahmadi H, Moeini A (2018) Landslide susceptibility mapping using Genetic Algorithm for the Rule Set Production (GARP) model. J Mt Sci 15(9):2013–2026. https://doi.org/10.1007/s11629-018-4833-5
Al Radaideh QA, Al Khateeb SS (2015) An associative rule-based classifier for Arabic medical text. Int J Knowl Eng Data Min 3:255. https://doi.org/10.1504/ijkedm.2015.074071
Al-Abadi AM (2018) Mapping flood susceptibility in an arid region of southern Iraq using ensemble machine learning classifiers: a comparative study. Arab J Geosci 11. https://doi.org/10.1007/s12517-018-3584-5
Alam F, Ofli F, Imran M (2020) Descriptive and visual summaries of disaster events using artificial intelligence techniques: case studies of Hurricanes Harvey, Irma, and Maria. Behav Inform Technol 39:288–318. https://doi.org/10.1080/0144929X.2019.1610908
ANA (2019) The National Water Agency. https://www.gov.br/ana/pt-br. Accessed 8 May 2019
Apurv T, Mehrotra R, Sharma A, Goyal MK, Dutta S (2015) Impact of climate change on floods in the Brahmaputra basin using CMIP5 decadal predictions. J Hydrol 527:281–291. https://doi.org/10.1016/j.jhydrol.2015.04.056
Arabameri A, Pradhan B, Rezaei K, Sohrabi M, Kalantari Z (2019a) GIS-based landslide susceptibility mapping using numerical risk factor bivariate model and its ensemble with linear multivariate regression and boosted regression tree algorithms. J Mt Sci 16:595–618. https://doi.org/10.1007/s11629-018-5168-y
Arabameri A, Rezaei K, Cerdà A, Conoscenti C, Kalantari Z (2019b) A comparison of statistical methods and multi-criteria decision making to map flood hazard susceptibility in Northern Iran. Sci Total Environ 660:443–458. https://doi.org/10.1016/j.scitotenv.2019.01.021
Artissa YBND, Asror I, Faraby SA (2019) Personality classification based on Facebook status text using multinomial naïve Bayes method. In: Journal of Physics: Conference Series
Bandeira APN, Coutinho RQ (2015) Critical Rainfall Parameters: Proposed Landslide Warning System for the Metropolitan Region of Recife, PE, Brazil. Soils & Rocks 38:27–48
Behnia P, Blais-Stevens A (2018) Landslide susceptibility modelling using the quantitative random forest method along the northern portion of the Yukon Alaska Highway Corridor, Canada. Nat Hazards 90:1407–1426. https://doi.org/10.1007/s11069-017-3104-z
Ben-David S, Shalev-Shwartz S (2014) Understanding machine learning: from theory to algorithms
Breiman L (2001) Random forest. Mach Learn 45:5–32. https://doi.org/10.1023/A:1010933404324
Bui DT, Lofman O, Revhaug I, Dick O (2011) Landslide susceptibility analysis in the Hoa Binh province of Vietnam using statistical index and logistic regression. Nat.Hazards 59:1413–1444
Bui DT, Shirzadi A, Shahabi H et al (2019) New ensemble models for shallow landslide susceptibility modeling in a semi-arid watershed. Forests 10:743. https://doi.org/10.3390/f10090743
Bui X, Nguyen H, Choi Y et al (2020) Prediction of slope failure in open-pit mines using a novel hybrid artificial intelligence model based on decision tree and evolution algorithm. Sci Rep 10:9939. https://doi.org/10.1038/s41598-020-66904-y
Cao Y, Yin K, Alexander DE, Zhou C (2016) Using an extreme learning machine to predict the displacement of step-like landslides in relation to controlling factors. Landslides 13:725–736. https://doi.org/10.1007/s10346-015-0596-z
Chang K, Merghadi A, Yunus AP et al (2019) Evaluating scale effects of topographic variables in landslide susceptibility models using GIS-based machine learning techniques. Sci Rep 9:12296. https://doi.org/10.1038/s41598-019-48773-2
Chen W, Panahi M, Pourghasemi HR (2017) Performance evaluation of GIS-based new ensemble data mining techniques of adaptive neuro-fuzzy inference system (ANFIS) with genetic algorithm (GA), differential evolution (DE), and particle swarm optimization (PSO) for landslide spatial modelling. Catena 157:310–324
Chen W, Xie X, Peng J, Shahabi H, Hong H, Bui DT, Duan Z, Li S, Zhu A-X (2018a) GIS-based landslide susceptibility evaluation using a novel hybrid integration approach of bivariate statistical based random forest method. CATENA 164:135–149. https://doi.org/10.1016/j.catena.2018.01.012
Chen W, Zhang S, Li R, Shahabi H (2018b) Performance evaluation of the GIS-based data mining techniques of best-first decision tree, random forest, and naïve Bayes tree for landslide susceptibility modeling. Sci Total Environ 644:1006–1018. https://doi.org/10.1016/j.scitotenv.2018.06.389
CPRM SG do B (2019) GeoSGB: Dados, informações e produtos do Serviço Geológico do Brasil. http://geosgb.cprm.gov.br. Accessed 5 Feb 2019
de Oliveira GG, Ruiz LFC, Guasselli LA, Haetinger C (2019) Random forest and artificial neural networks in landslide susceptibility modeling: a case study of the Fão River Basin, Southern Brazil. Nat Hazards 99:1049–1073. https://doi.org/10.1007/s11069-019-03795-x
Domingos P, Pazzani M (1997) On the optimality of the simple Bayesian classifier underzero-one loss. Mach Learn 29:103–130
Dou J, Tien Bui DP, Yunus A, Jia K, Song X, Revhaug I, Xia H, Zhu Z (2015) Optimization of causative factors for landslide susceptibility evaluation using remote sensing and GIS data in parts of Niigata, Japan. PLoS One 10:e0133262
Dou J, Yunus AP, Bui DT, Merghadi A, Sahana M, Zhu Z, Chen CW, Han Z, Pham BT (2020a) Improved landslide assessment using support vector machine with bagging, boosting, and stacking ensemble machine learning framework in a mountainous watershed, Japan. Landslides 17:641–658. https://doi.org/10.1007/s10346-019-01286-5
Dou J, Yunus AP, Merghadi A, Shirzadi A, Nguyen H, Hussain Y, Avtar R, Chen Y, Pham BT, Yamagishi H (2020b) Different sampling strategies for predicting landslide susceptibilities are deemed less consequential with deep learning. Sci Total Environ 720. https://doi.org/10.1016/j.scitotenv.2020.137320
Eilander D, Trambauer P, Wagemaker J, Van Loenen A (2016) Harvesting social media for generation of near real-time flood maps. Procedia Eng 154:176–183. https://doi.org/10.1016/j.proeng.2016.07.441
Ermini L, Catani F, Casagli N (2005) Artificial neural networks applied to landslide susceptibility assessment. Geomorphology 66:327–343. https://doi.org/10.1016/j.geomorph.2004.09.025
Fan C, Jiang Y, Mostafavi A (2020a) Social sensing in disaster city digital twin: integrated textual-visual-geo framework for situational awareness during built environment disruptions. J Manag Eng 36:1–13. https://doi.org/10.1061/(ASCE)ME.1943-5479.0000745
Fan C, Jiang Y, Yang Y, Zhang C, Mostafavi A (2020b) Crowd or Hubs: information diffusion patterns in online social networks in disasters. Int J Disaster Risk Reduct 46:101498. https://doi.org/10.1016/j.ijdrr.2020.101498
Feng Q, Liu J, Gong J (2015) Urban flood mapping based on unmanned aerial vehicle remote sensing and random forest classifier-A case of yuyao, China. Water (Switzerland) 7:1437–1455. https://doi.org/10.3390/w7041437
Frank E, Bouckaert RR (2006) Naive bayes for text classification with unbalanced classes. Lect Notes Comput Sci 4213(LNAI):503–510
Gislason PO, Benediktsson JA, Sveinsson JR (2006) Random forests for land cover classification. Pattern Recogn Lett 27:294–300. https://doi.org/10.1016/j.patrec.2005.08.011
Guri PK, Champatiray PK, Patel RC (2015) Spatial prediction of landslide susceptibility in parts of Garhwal Himalaya, India, using the weight of evidence modelling. Environ Monit Assess 187. https://doi.org/10.1007/s10661-015-4535-1
Gusmão AD (2005) Prática de Fundações no Recife. In Geotecnia no Nordeste. Gusmão, A. D., GUusmão Filho, J. A., Oliveira, J. T. R., Maia, G. B. (Orgs). Editora da UFPE, Recife. p: 225-264 (In Portuguese)
Gusmão Filho JA (1998) Fundações: do Conhecimento Geológico à Prática na Engenharia. Editora da UFPE, Recife, (In Portuguese)
Guzzetti F (2016) Forecasting natural hazards, performance of scientists, ethics, and the need for transparency. Toxicol Environ Chem 98:1043–1059. https://doi.org/10.1080/02772248.2015.1030664
Häberle M, Werner M, Zhu XX (2019) Geo-spatial text-mining from Twitter–a feature space analysis with a view toward building classification in urban regions. Eur J Remote Sens 52:2–11. https://doi.org/10.1080/22797254.2019.1586451
Han X, Kwoh CK (2019) Natural language processing approaches in bioinformatics. Encycl Bioinforma Comput Biol 1:561–574. https://doi.org/10.1016/b978-0-12-809633-8.20463-9
Harris JR, Grunsky E, Behnia P, Corrigan D (2015) Data- and knowledge-driven mineral prospectivity maps for Canada’s North. Ore Geol Rev 71:788–803. https://doi.org/10.1016/j.oregeorev.2015.01.004
Hartmann J, Huppertz J, Schamp C, Heitmann M (2019) Comparing automated text classification methods. Int J Res Mark 36:20–38. https://doi.org/10.1016/j.ijresmar.2018.09.009
Hong H, Pradhan B, Xu C, Tien Bui D (2015) Spatial prediction of landslide hazard at the Yihuang area (China) using two-class kernel logistic regression, alternating decision tree and support vector machines. Catena 133:266–281. https://doi.org/10.1016/j.catena.2015.05.019
Hong H, Shahabi H, Shirzadi A, et al (2019) Landslide susceptibility assessment at the Wuning area, China: a comparison between multi-criteria decision making, bivariate statistical and machine learning methods. Springer Netherlands
INPE (2019) TOPODATA: Banco de dados geomorfológicos do Brasil. http://www.dsr.inpe.br/topodata/index.php. Accessed 6 Feb 2019
Instituto Brasileiro de Geografia e Estatística - IBGE (2011) IBGE Cidades. https://cidades.ibge.gov.br/brasil/pe/recife. Accessed 27 Sep 2019
IPCC (2012) Managing the risks of extreme events and disasters to advance climate change adaptation. Cambridge University Press, Cambridge, UK, and New York, NY, USA
Kadavi PR, Lee CW, Lee S (2019) Landslide-susceptibility mapping in Gangwon-do, South Korea, using logistic regression and decision tree models. Environ Earth Sci 78:1–17. https://doi.org/10.1007/s12665-019-8119-1
Landis JR, Koch GG (1977) The measurement of observer agreement for categorical data. Biometrics 33:159–174. https://doi.org/10.2307/2529310
Lee S, Kim JC, Jung HS, Lee MJ, Lee S (2017) Spatial prediction of flood susceptibility using random-forest and boosted-tree models in Seoul metropolitan city, Korea. Geomatics Nat Hazards Risk 8:1185–1203. https://doi.org/10.1080/19475705.2017.1308971
Liaw A, Wiener M (2002) Classification and regression by randomforest. R News 2:18–22
Maganti N, Tan H, Niziol LM, Amin S, Hou A, Singh K, Ballouz D, Woodward MA (2019) Natural language processing to quantify microbial keratitis measurements. Ophthalmology 126:1–3. https://doi.org/10.1016/j.ophtha.2019.06.003
Merghadi A, Abderrahmane B, Tien Bui D (2018) Landslide susceptibility assessment at Mila Basin (Algeria): a comparative assessment of prediction capability of advanced machine learning methods. ISPRS Int J Geo-Inform 7. https://doi.org/10.3390/ijgi7070268
Merghadi A, Yunusb AP, Dou J (2020) Machine learning methods for landslide susceptibility studies: A comparative overview of algorithm performance. Earth Sci Rev 207. https://doi.org/10.1016/j.earscirev.2020.103225
Mousavi SZ, Kavian A, Soleimani K, Mousavi SR, Shirzadi A (2011) GIS-based spatial prediction of landslide susceptibility using logistic regression model. Geomatics Nat Hazards Risk 2:33–50. https://doi.org/10.1080/19475705.2010.532975
Nadkarni PM, Ohno-Machado L, Chapman WW (2011) Natural language processing: an introduction. J Am Med Inform Assoc 18:544–551. https://doi.org/10.1136/amiajnl-2011-000464
Nandi A, Mandal A, Wilson M, Smith D (2016) Flood hazard mapping in Jamaica using principal component analysis and logistic regression. Environ Earth Sci 75. https://doi.org/10.1007/s12665-016-5323-0
Nascimento KRDS, Alencar MH (2016) Management of risks in natural disasters: a systematic review of the literature on NATECH events. J Loss Prev Process Ind 44:347–359. https://doi.org/10.1016/j.jlp.2016.10.003
Nefeslioglu HA, Sezer E, Gokceoglu C, Bozkir AS, Duman TY (2010) Assessment of landslide susceptibility by decision trees in the metropolitan area of Istanbul, Turkey. Math Probl Eng 2010:1–15. https://doi.org/10.1155/2010/901095
O’Brien RM (2007) A caution regarding rules of thumb for variance inflation factors. Qual Quant 41:673–690. https://doi.org/10.1007/s11135-006-9018-6
Pal SC, Chowdhuri I (2019) GIS-based spatial prediction of landslide susceptibility using frequency ratio model of Lachung River basin, North Sikkim, India. SN Appl Sci 1:416. https://doi.org/10.1007/s42452-019-0422-7
Peng L, Niu R, Huang B, Wu X, Zhao Y, Ye R (2014) Landslide susceptibility mapping based on rough set theory and support vector machines: A case of the Three Gorges area, China. Geomorphology 204:287–301. https://doi.org/10.1016/j.geomorph.2013.08.013
Pham BT, Tien Bui D, Prakash I (2017) Landslide susceptibility assessment using bagging ensemble based alternating decision trees, logistic regression and J48 decision trees methods: a comparative study. Geotech Geol Eng 35:2597–2611. https://doi.org/10.1007/s10706-017-0264-2
Pham BT, Shirzadi A, Shahabi H et al (2019) Landslide susceptibility assessment by novel hybrid machine learning algorithms. Sustain 11:1–25. https://doi.org/10.3390/su11164386
Pourghasemi HR, Rahmati O (2018) Prediction of the landslide susceptibility: Which algorithm, which precision? CATENA 162:177–192. https://doi.org/10.1016/j.catena.2017.11.022
Pranckevičius T, Marcinkevičius V (2017) Comparison of naive Bayes, random forest, decision tree, support vector machines, and logistic regression classifiers for text reviews classification. Balt J Mod Comput 5:221–232. https://doi.org/10.22364/bjmc.2017.5.2.05
R Core Team (2019) R: A language and environment for statistical computing
Rahmati O, Golkarian A, Biggs T, Keesstra S, Mohammadi F, Daliakopoulos IN (2019) Land subsidence hazard modeling: machine learning to identify predictors and the role of human activities. J Environ Manag 236:466–480. https://doi.org/10.1016/j.jenvman.2019.02.020
Ramasubramanian K, Singh A (2017) Machine Learning Using R: With Time Series and Industry-Based Use Cases in R, 2a. Apress, New Delhi
Robin X, Turck N, Hainard A, Tiberti N, Lisacek F, Sanchez JC, Müller M (2011) pROC: an open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinform 12:77
SEDEC (2019) Dados Abertos da Prefeitura de Recife. http://dados.recife.pe.gov.br. Accessed 8 May 2019
Segoni S, Lagomarsino D, Fanti R, Moretti S, Casagli N (2015) Integration of rainfall thresholds and susceptibility maps in the Emilia Romagna (Italy) regional-scale landslide warning system. Landslides 12:773–785. https://doi.org/10.1007/s10346-014-0502-0
Sevgen E, Kocaman S, Nefeslioglu HA, Gokceoglu C (2019) A novel performance assessment approach using photogrammetric techniques for landslide susceptibility mapping with logistic regression. Ann Random Forest Sens 19. https://doi.org/10.3390/s19183940
Shafizadeh-Moghadam H, Valavi R, Shahabi H, Chapi K, Shirzadi A (2018) Novel forecasting approaches using combination of machine learning and statistical models for flood susceptibility mapping. J Environ Manag 217:1–11. https://doi.org/10.1016/j.jenvman.2018.03.089
Smith L, Liang Q, James P, Lin W (2017) Assessing the utility of social media as a data source for flood risk management using a real-time modelling framework. J Flood Risk Manag 10:370–380. https://doi.org/10.1111/jfr3.12154
Steger S, Brenning A, Bell R, Petschko H, Glade T (2016) Exploring discrepancies between quantitative validation resultsand the geomorphic plausibility of statistical landslide susceptibility maps. Geomorphology 262:8–23. https://doi.org/10.1016/j.geomorph.2016.03.015
Steger S, Mair V, Kofler C, Schneiderbauer S, and Zebisch M (2020) The necessity to consider the landslide data origin in statistically-based spatial predictive modelling – a landslide intervention index for South Tyrol (Italy), EGU General Assembly 2020, Online, 4–8 May 2020, EGU2020-3440, https://doi.org/10.5194/egusphere-egu2020-3440
Su C, Wang L, Wang X, Huang Z, Zhang X (2015) Mapping of rainfall-induced landslide susceptibility in Wencheng, China, using support vector machine. Nat Hazards 76:1759–1779. https://doi.org/10.1007/s11069-014-1562-0
Ten Veldhuis JAE, Harder RC, Loog M (2013) Automatic classification of municipal call data to support quantitative risk analysis of urban drainage systems. Struct Infrastruct Eng 9:141–150. https://doi.org/10.1080/15732479.2010.535543
United States Geological Survey - USGS (2019) Landsat 8. https://landsat.gsfc.nasa.gov/landsat-8/
Valeriano MM (2008) Dados topográficos. In: FLORENZANO TG (Org) Geomorfologia: conceitos e tecnologias atuais. São Paulo: Oficina de Textos pp 74–103
Wang Z, Lai C, Chen X, Yang B, Zhao S, Bai X (2015) Flood hazard risk assessment model based on random forest. J Hydrol 527:1130–1141. https://doi.org/10.1016/j.jhydrol.2015.06.008
Wang P, Bai X, Wu X, Yu H, Hao Y, Hu B (2018) GIS-based random forest weight for rainfall-induced landslide susceptibility assessment at a humid region in Southern China. Water 10(8):1019. https://doi.org/10.3390/w10081019
Wickham H (2017) Tidyverse: Easily Install and Load the “Tidyverse”
Xia H, Zhao W, Li A, Bian J, Zhang Z (2017) Subpixel inundation mapping using landsat-8 OLI and UAV data for a wetland region on the zoige plateau, China. Remote Sens 9:1–22. https://doi.org/10.3390/rs9010031
Xu J, Wang Z, Shen F, Ouyang C, Tu Y (2016) Natural disasters and social conflict: a systematic literature review. Int J Disaster Risk Reduct 17:38–48. https://doi.org/10.1016/j.ijdrr.2016.04.001
Zhang K, Wu X, Niu R, Yang K, Zhao L (2017) The assessment of landslide susceptibility mapping using random forest and decision tree methods in the Three Gorges Reservoir area, China. Environ Earth Sci 76:1–20. https://doi.org/10.1007/s12665-017-6731-5
Zhao G, Pang B, Xu Z, Yue J, Tu T (2018) Mapping flood susceptibility in mountainous areas on a national scale in China. Sci Total Environ 615:1133–1142. https://doi.org/10.1016/j.scitotenv.2017.10.037
Funding
This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brasil (CAPES) - Finance Code 001. This study is also financially supported by the National Council for Scientific and Technological Development (CNPq)–process numbers 305792/2017-2, 305119/2017-6, and 421851/2018-0.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Rodrigues, S.G., Silva, M.M. & Alencar, M.H. A proposal for an approach to mapping susceptibility to landslides using natural language processing and machine learning. Landslides 18, 2515–2529 (2021). https://doi.org/10.1007/s10346-021-01643-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10346-021-01643-3