Abstract
The quality of the data for statistical methods plays an important role in landslide susceptibility mapping. How different data types influence the performance of landslide susceptibility maps is worth studying. The goal of this study was to explore the effects of different data types namely, presence-only (PO), presence-absence (PA), and pseudo-absence (PAs) data, on the predictive capability of landslide susceptibility mapping. This was completed by conducting a case study in the landslide-prone Honghe County in the Yunnan Province of China. A total of 428 landslide PO data points were selected. An equivalent number of non-landslide locations were generated as PA data by random sampling, and 10,000 sites were uniformly selected at random from each region as PAs data. Three landslide susceptibility models, namely the information value model (IVM), logistic regression (LR) model, and maximum entropy (MaxEnt) model, corresponding to the three data types were investigated. Additionally, the area under the receiver operating characteristic curves (ROC-AUC), seven statistical indices (i.e. accuracy, sensibility, false-positive rate, specificity, precision, Kappa, and F-measure), and a landslide density analysis were used to evaluate model performance regarding landslide susceptibility mapping. Our results indicated that the MaxEnt model using PAs data performed the best and had the highest fitness with the highest ROC-AUC values and statistical indices, followed by the IVM model with only landslide data (PO), and the LR model using PA data. Using PAs data avoided the inherent over-predictive shortcomings of PO data by limiting the predicted area of high-landslide susceptibility. Additionally, the random sampling design of landslide PA data increased the uncertainty of landslide susceptibility mapping and influenced the performance of the model. Therefore, our results suggested that the PAs data sampling provided a useful data type in the absence of high-quality data. Finally, we summarized the principles, advantages, and disadvantages of the three data types to assist with model optimization and the improvement of predicted performance and fitness.
Similar content being viewed by others
Reference
Achour Y, Boumezbeur A, Hadji R, et al. (2017) Landslide susceptibility mapping using analytic hierarchy process and information value methods along a highway road section in Constantine, Algeria. Arabian Journal of Geosciences 10(8). https://doi.org/10.1007/s12517-017-2980-6
Achour Y, Garçia S, Cavaleiro V (2018) GIS-based spatial prediction of debris flows using logistic regression and frequency ratio models for Zêzere River basin and its surrounding area, Northwest Covilhã, Portugal. Arabian Journal of Geosciences 11(18). https://doi.org/10.1007/s12517-018-3920-9
Anderson RP, Lew D, Peterson AT (2003) Evaluating predictive models of species’ distributions: criteria for selecting optimal models. Ecological Modelling 162: 211–232. https://doi.org/10.1016/s0304-3800(02)00349-6
Araújo MB, Guisan A (2006) Five (or so) Challenges for species distribution modelling. Journal of biogeography 33(10): 1677–1688. https://doi.org/10.1111/j.1365-2699.2006.01584.x
Ayalew L, Yamagishi H (2005) The application of GIS-based logistic regression for landslide susceptibility mapping in the Kakuda-Yahiko Mountains, Central Japan. Geomorphology 65(1/2): 15–31. https://doi.org/10.1016/j.geomorph.2004.06.010
Baharvand S, Rahnamarad J, Soori S, et al. (2020) Landslide susceptibility zoning in a catchment of zagros mountains using fuzzy logic and gis. Environmental Earth Sciences 79(10). https://doi.org/10.1007/s12665-020-08957-w
Bueechi E, Klimeš J, Frey H, et al. (2018) Regional-scale landslide susceptibility modelling in the Cordillera Blanca, Peru—a comparison of different approaches. Landslides 16(2): 395–407. https://doi.org/10.1007/s10346-018-1090-1
Cao J, Zhang Z, Wang C, et al. (2019) Susceptibility assessment of landslides triggered by earthquakes in the Western Sichuan Plateau. Catena 175: 63–76. https://doi.org/10.1016/j.catena.2018.12.013
Chauhan S, Sharma M, Arora MK, et al. (2010) Landslide susceptibility zonation through ratings derived from artificial neural networks. International Journal of Applied Earth Observations and Geoinformation, 12(5): 340–350. https://doi.org/10.1016/j.jag.2010.04.006
Chefaoui RM, Lobo JM (2008) Assessing the effects of pseudoabsences on predictive distribution model performance. Ecological Modelling 210(4): 478–486. https://doi.org/10.1016/j.ecolmodel.2007.08.010
Chen T, Zhu L, Niu R, et al. (2020) Mapping landslide susceptibility at the Three Gorges Reservoir, China, using gradient boosting decision tree, random forest and information value models. Journal of Mountain Science 17(3): 670–685. https://doi.org/10.1007/s11629-019-5839-3
Chen W, Pourghasemi HR, Kornejady A (2017) Landslide spatial modeling: Introducing new ensembles of ANN, MaxEnt, and SVM machine learning techniques. Geoderma 305: 314–327. https://doi.org/10.1016/j.geoderma.2017.06.020
Chen W, Shahabi H, Shirzadi A, et al. (2018a) Novel hybrid artificial intelligence approach of bivariate statistical-methods-based kernel logistic regression classifier for landslide susceptibility modeling. Bulletin of Engineering Geology and the Environment 78(6): 1–23. https://doi.org/10.1007/s10064-018-1401-8
Chen W, Zhang S, Li R, et al. (2018b) Performance evaluation of the GIS-based data mining techniques of best-first decision tree, random forest, and naïve Bayes tree for landslide susceptibility modeling. Science of The Total Environment 644: 1006–1018. https://doi.org/10.1016/j.scitotenv.2018.06.389
Chung CJF, Fabbri AG (2003) Validation of spatial prediction models for landslide hazard mapping. Natural Hazards 30(3): 451–472. https://doi.org/10.1023/b:nhaz.0000007172.62651
Convertino A, Troccoli A, Catani F (2013) Detecting fingerprints of landslide drivers: a MaxEnt model. Journal of Geophysical Research Earth Surface 118: 1367–1386. https://doi.org/10.1002/jgrf.20099
Crozier MJ, Glade T (2012) Landslide Hazard and Risk: Issues, Concepts and Approach. Landslide Hazard and Risk 1–40. https://doi.org/10.1002/9780470012659.ch1
Devkota KC, Regmi AD, Pourghasemi HR, et al. (2013) Landslide susceptibility mapping using certainty factor, index of entropy and logistic regression models in GIS and their comparison at Mugling-Narayanghat road section in Nepal Himalaya. Natural Hazards 65(1): 135–165. https://doi.org/10.1007/s11069-012-0347-6
Du G, Zhang Y, Iqbal J, et al. (2017) Landslide susceptibility mapping using an integrated model of information value method and logistic regression in the Bailongjiang watershed, Gansu Province, China. Journal of Mountain Science 14(2): 249–268. https://doi.org/10.1007/s11629-016-4126-9
Elith J, Graham CH, Anderson RP, et al. (2006) Novel methods improve prediction of species’ distributions from occurrence data. Ecography 29: 129–151. https://doi.org/10.1111/j.2006.0906-7590.04596.x
Felicísimo ÁM, Cuartero A, Remondo J, et al. (2013) Mapping landslide susceptibility with logistic regression, multiple adaptive regression splines, classification and regression trees, and maximum entropy methods: a comparative study. Landslides 10(2): 175–189. https://doi.org/10.1007/s10346-012-0320-1
Guzzetti F, Carrara A, Cardinali M, et al. (1999) Landslide hazard evaluation: a review of current techniques and their application in a multi-scale study, Central Italy. Geomorphology 31: 181–216. https://doi.org/10.1016/S0169-555X(99)00078-1
Hirzel AH, Hausser J, Chessel D, et al. (2002) Ecological—niche factor analysis: how to compute habitat—suitability maps without absence data? Ecology 83(7): 2027–2036. https://doi.org/10.1890/0012-9658(2002)083[2027:ENFAHT]2.0.CO;2
Hong H, Pourghasemi HR, Pourtaghi ZS (2016) Landslide susceptibility assessment in Lianhua County (China): A comparison between a random forest data mining technique and bivariate and multivariate statistical models. Geomorphology 259: 105–118. https://doi.org/10.1016/j.geomorph.2016.02.012
Hong H, Miao Y, Liu J, Zhu AX (2019) Exploring the effects of the design and quantity of absence data on the performance of random forest-based landslide susceptibility mapping. Catena 176: 45–64. https://doi.org/10.1016/j.catena.2018.12.035
Huang Y, Zhao L (2018) Review on landslide susceptibility mapping using support vector machines. Catena 165: 520–529. https://doi.org/10.1016/j.catena.2018.03.003
Jiao Y, Zhao D, Ding Y, et al. (2019) Performance evaluation for four GIS-based models purposed to predict and map landslide susceptibility: A case study at a World Heritage site in Southwest China. Catena 183: 104221. https://doi.org/10.1016/j.catena.2019.104221
Kornejady A, Ownegh M, Bahremand A (2017) Landslide susceptibility assessment using maximum entropy model with two different data sampling methods. Catena 152: 144–162. https://doi.org/10.1016/j.catena.2017.01.010
Merghadi A, Yunus AP, Dou J, et al. (2020) Machine learning methods for landslide susceptibility studies: A comparative overview of algorithm performance. Earth Science Reviews 103225. https://doi.org/10.1016/j.earscirev.2020.103225
Panahi M, Gayen A, Pourghasemi HR, et al. (2020) Spatial prediction of landslide susceptibility using hybrid support vector regression (SVR) and the adaptive neuro-fuzzy inference system (ANFIS) with various metaheuristic algorithms. Science of The Total Environment 741. https://doi.org/10.1016/j.scitotenv.2020.139937
Park NW (2014) Using maximum entropy modeling for landslide susceptibility mapping with multiple geoenvironmental data sets. Environmental Earth Sciences 73(3): 937–949. https://doi.org/10.1007/s12665-014-3442-z
Pearce JL, Boyce MS (2006) Modelling distribution and abundance with presence—only data[J]. Journal of applied ecology 43(3): 405–412. https://doi.org/10.1111/j.1365-2664.2005.01112.x
Peethambaran B, Anbalagan R, Kanungo DP, et al. (2020) A comparative evaluation of supervised machine learning algorithms for township level landslide susceptibility zonation in parts of Indian Himalayas. Catena 195: 104751. https://doi.org/10.1016/j.catena.2020.104751
Pham BT, Prakash I, Tien Bui D (2018) Spatial prediction of landslides using a hybrid machine learning approach based on Random Subspace and Classification and Regression Trees. Geomorphology 303: 256–270. https://doi.org/10.1016/j.geomorph.2017.12.008
Pham BT, Prakash I, Singh SK, et al. (2019) Landslide susceptibility modeling using Reduced Error Pruning Trees and different ensemble techniques: Hybrid machine learning approaches. Catena 175: 203–218. https://doi.org/10.1016/j.catena.2018.12.018
Phillips SJ (2012) A brief tutorial on Maxent. Lessons in Conservation 3: 107–135.
Phillips SJ, Anderson RP, Schapire RE (2006) Maximum entropy modeling of species geographic distributions. Ecological Modelling 190: 231–259. https://doi.org/10.1016/j.ecolmodel.2005.03.026
Phillips SJ, Dudík M (2008) Modeling of species distributions with Maxent: new extensions and a comprehensive evaluation. Ecography 31(2): 161–175. https://doi.org/10.1111/j.0906-7590.2008.5203.x
Pourghasemi HR, Pradhan B, Gokceoglu C, et al. (2012) Landslide Susceptibility Mapping Using a Spatial Multi Criteria Evaluation Model at Haraz Watershed, Iran. Terrigenous Mass Movements 23–49. https://doi.org/10.1007/978-3-642-25495-6_2
Pourghasemi HR, Yousefi S, Kornejady A, et al. (2017) Performance assessment of individual and ensemble data-mining techniques for gully erosion modeling. Science of The Total Environment 609: 764–775. https://doi.org/10.1016/j.scitotenv.2017.07.198
Pourtaghi ZS, Pourghasemi HR (2014) GIS-based groundwater spring potential assessment and mapping in the Birjand Township, southern Khorasan Province, Iran. Hydrogeology Journal 22(3): 643–662. https://doi.org/10.1007/s10040-013-1089-6
Pradhan AMS, Kim YT (2016) Spatial data analysis and application of evidential belief functions to shallow landslide susceptibility mapping at Mt. Umyeon, Seoul, Korea. Bulletin of Engineering Geology and the Environment 76(4): 1263–1279. https://doi.org/10.1007/s10064-016-0919-x
Pradhan B, Lee S (2010) Delineation of landslide hazard areas on Penang Island, Malaysia, by using frequency ratio, logistic regression, and artificial neural network models. Environment Earth Sciences 60: 1037–1054. https://doi.org/10.1007/s12665-009-0245-8
Reichenbach P, Rossi M, Malamud BD, et al. (2018) A review of statistically-based landslide susceptibility models. Earth Science Reviews 180: 60–91. https://doi.org/10.1016/j.earscirev.2018.03.001
Saito H, Nakayama D, Matsuyama H (2009) Comparison of landslide susceptibility based on a decision-tree model and actual landslide occurrence: The Akaishi Mountains, Japan. Geomorphology 109(3–4): 108–121. https://doi.org/10.1016/j.geomorph.2009.02.026
Stockwell D, Peterson D (1999) The GARP modelling system: problems and solutions to automated spatial prediction. International Journal of Geographical Information Science 13: 143–158. https://doi.org/10.1080/136588199241391
Tien Bui D, Tuan TA, Klempe H, et al. (2016) Spatial prediction models for shallow landslide hazards: a comparative assessment of the efficacy of support vector machines, artificial neural networks, kernel logistic regression, and logistic model tree. Landslides 13(2): 361–378. https://doi.org/10.1007/s10346-015-0557-6
Van Westen CJ (1993) Application of geographic information systems to landslide hazard zonation. Doctoral dissertation, TU Delft, Delft University of Technology.
VanDerWal J, Shoo LP, Graham C, et al. (2009) Selecting pseudo-absence data for presence-only distribution modeling: How far should you stray from what you know? Ecological Modelling 220(4): 589–594. https://doi.org/10.1016/j.ecolmodel.2008.11.010
Varnes DJ (1984) Landslide hazard zonation: a review of principles and practice. Commission on landslides of the IAEG. Natural hazards, 3, 61p.
Wang Y, Fang Z, Hong H (2019) Comparison of convolutional neural networks for landslide susceptibility mapping in Yanshan County, China. Science of The Total Environment 666(MAY 20): 975–993. https://doi.org/10.1016/j.scitotenv.2019.02.263
Wisz MS, Guisan A (2009) Do pseudo-absence selection strategies influence species distribution models and their predictions? An information-theoretic approach based on simulated data. BMC Ecology 9(1): 8. https://doi.org/10.1186/1472-6785-9-8
Wu R, Zhang Y, Guo C, et al. (2020) Landslide susceptibility assessment in mountainous area: a case study of Sichuan-Tibet railway, china. Environmental Earth Sciences 79(6). https://doi.org/10.1007/s12665-020-8878-8
Yan F, Zhang Q, Ye S, et al. (2018) A novel hybrid approach for landslide susceptibility mapping integrating analytical hierarchy process and normalized frequency ratio methods with the cloud model. Geomorphology. https://doi.org/10.1016/j.geomorph.2018.10.024
Yin KL, Yan TZ (1988) Statistical prediction model for slope instability of metamorphosed rocks. In: Proceedings of the 5th international symposium on landslides, Lausanne, Switzerland. 2: 1269–1272.
Zabihi M, Mirchooli F, Motevalli A, et al. (2018) Spatial modelling of gully erosion in Mazandaran Province, northern Iran. Catena 161: 1–13. https://doi.org/10.1016/j.catena.2017.10.010
Zêzere JL, Pereira S, Melo R, et al. (2017) Mapping landslide susceptibility using data-driven methods. Science of the Total Environment 589: 250–267. https://doi.org/10.1016/j.scitotenv.2017. 02.188
Zhang Y, Lan H, Li L, et al. (2020) Optimizing the frequency ratio method for landslide susceptibility assessment: A case study of the Caiyuan Basin in the southeast mountainous area of China. Journal of Mountain Science 17(2): 340–357. https://doi.org/10.1007/s11629-019-5702-6
Zhu AX, Miao Y, Yang L, et al. (2018) Comparison of the presence-only method and presence-absence method in landslide susceptibility mapping. Catena 171: 222–233. https://doi.org/10.1016/j.catena.2018.07.012
Zhu AX, Miao Y, Liu J, et al. (2019) A similarity-based approach to sampling absence data for landslide susceptibility mapping using data-driven methods. Catena 183: 104188. https://doi.org/10.1016/j.catena.2019.104188
Acknowledgments
This research was supported by the Multi-government International Science and Technology Innovation Cooperation Key Project of National Key Research and Development Program of China for the ‘Environmental monitoring and assessment of LULC change impact on ecological security using geospatial technologies’ (Grant No. 2018YFE0184300), National Natural Science Foundation of China (Grant Nos. 41271203, 41761115), and the Program for Innovative Research Team (in Science and Technology) in the University of Yunnan Province, IRTSTYN. The authors would like to thank the anonymous reviewers for their helpful comments on the primary version of the manuscript. We are extremely grateful to the Land and Resources Bureau of Honghe County in Yunnan Province of China for providing data.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zhao, Dm., Jiao, Ym., Wang, Jl. et al. Comparative performance assessment of landslide susceptibility models with presence-only, presence-absence, and pseudo-absence data. J. Mt. Sci. 17, 2961–2981 (2020). https://doi.org/10.1007/s11629-020-6277-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11629-020-6277-y