Abstract
This paper adopted the feature selection method for information gain calculation to screen factors for landslide hazard susceptibility evaluation in Wushan County, the reservoir area of the Three Gorges Reservoir Hub. A comparative experimental study of evaluation factor screening was conducted via the Bayesian information criterion method and information gain method. The following conclusions could be drawn: in the selection of regional landslide susceptibility evaluation factors, the information gain value could be used as a screening index. The screening method steps are as follows: the information gain value of each factor is calculated, and the corresponding evaluation factors are selected based on the order of the information gain values. The percentage of the information gain value of a single factor to the sum of the information gain values of all factors could be used as an evaluation factor screening criterion. Since the information gain value provides a clear meaning and is easy to calculate in terms of the contribution of each evaluation factor to the evaluation model accuracy, this index could play an important role in the screening process of landslide susceptibility evaluation factors.
Similar content being viewed by others
References
Akaike H (1992) Information theory and an extension of the maximum likelihood principle. Breakthroughs in Statistics. Springer, New York, pp 610–624. https://doi.org/10.1007/978-1-4612-0919-5_38
Appavu S, Rajaram R, Nagammai M, Priyanga N, Priyanka S (2011) Bayes theorem and information gain based feature selection for maximizing the performance of classifiers. Communications in Computer and Information Science. Springer, Berlin, Heidelberg, pp 501–511. https://doi.org/10.1007/978-3-642-17857-3_49
Asadi M, Goli Mokhtari L, Shirzadi A, Shahabi H, Bahrami S (2022) A comparison study on the quantitative statistical methods for spatial prediction of shallow landslides (case study: Yozidar-Degaga Route in Kurdistan Province, Iran). Environ Earth Sci 81(2):1–21. https://doi.org/10.1007/s12665-021-10152-4
Beguería S (2006) Validation and evaluation of predictive models in hazard assessment and risk management. Nat Hazards 37(3):315–329. https://doi.org/10.1007/s11069-005-5182-6
Chai X, Nie L, Mao W, Zhang M (2015) Sensibility analysis of the Wen Zhutang landslide stability based on grey correlation analysis. Proceedings of the 2015 International conference on Applied Science and Engineering Innovation. https://doi.org/10.2991/asei-15.2015.256
Carrara A, Crosta G, Frattini P (2003) Geomorphological and historical data in assessing landslide hazard. Earth Surf Proc Land 28(10):1125–1142. https://doi.org/10.1002/esp.545
Dai FC, Lee CF (2002a) Landslide characteristics and slope instability modeling using GIS, Lantau Island, Hong Kong. Geomorphology 42(3–4):213–228. https://doi.org/10.1016/s0169-555x(01)00087-3
Dai FC, Lee CF (2002b) Landslide risk assessment and management: an overview. Eng Geol 64:65–87. https://doi.org/10.1016/s0013-7952(01)00093-x
Fatemi Aghda SM, Bagheri V, Razifard M (2017) Landslide susceptibility mapping using fuzzy logic system and its influences on mainlines in Lashgarak Region, Tehran, Iran. Geotech Geol Eng 36(2):915–937. https://doi.org/10.1007/s10706-017-0365-y
Fawcett T (2006) An introduction to ROC analysis. Pattern Recogn Lett 27(8):861–874. https://doi.org/10.1016/j.patrec.2005.10.010
Gorsevski PV, Jankowski P (2008) Discerning landslide susceptibility using rough sets. Comput Environ Urban Syst 32(1):53–65. https://doi.org/10.1016/j.compenvurbsys.2007.04.001
Guo C, Xu Q, Dong XJ, Li WL, Zhao KY, Lu HY, Ju YZ (2021) Geohazard recognition and inventory mapping using airborne LiDAR data in complex mountainous areas. J Earth Sci 32(5):1079–1091. https://doi.org/10.1007/s12583-021-1467-2
Ip WC, Hu BQ, Wong H, Xia J (2009) Applications of grey relational method to river environment quality evaluation in China. J Hydrol 379(3–4):284–290. https://doi.org/10.1016/j.jhydrol.2009.10.013
Lai CM, Yeh WC, Chang CY (2016) Gene selection using information gain and improved simplified swarm optimization. Neurocomputing 218:331–338. https://doi.org/10.1016/j.neucom.2016.08.089
Lee S, Talib JA (2005) Probabilistic landslide susceptibility and factor effect analysis. Environ Geol 47:982–990. https://doi.org/10.1007/s00254-005-1228-z
Li J, Li X, Lv NQ, Yang Y, Xi BD, Li MX, Bai SG, Liu D (2015) Quantitative assessment of groundwater pollution intensity on typical contaminated sites in China using grey relational analysis and numerical simulation. Environ Earth Sci 74:3955–3968. https://doi.org/10.1007/s12665-014-3980-4
Liu JP, Zeng ZP, Liu HQ, Wang HB (2011) A rough set approach to analyze factors affecting landslide incidence. Comput Geosci 37:1311–1317. https://doi.org/10.1016/j.cageo.2011.02.010
Li XP (2005) Study on logistic regression model applied to regional slope stability evaluation based on GIS. China University of Geosciences (Dissertation, in Chinese)
Miao HB, Yin KL, Xu F (2010) Multi model prediction and comprehensive evaluation of landslide displacement based on factor analysis. J Wuhan Univ Technol 32(19):65–70. https://doi.org/10.3969/j.issn.1001-2400.2012.01.008
Nosrati K, Van Den Eeckhaut M (2012) Assessment of groundwater quality using multivariate statistical techniques in Hashtgerd Plain, Iran. Environ Earth Sci 65:331–344. https://doi.org/10.1007/s12665-011-1092-y
Pawlak Z (1982) Rough sets. Int J Comput Inform Sci 11(5):341–356. https://doi.org/10.1007/BF01001956
Quinlan J (1986) Induction of decision trees. Mach Learn 1:81–106. https://doi.org/10.1023/A:1022643204877
Raghuvanshi TK, Ibrahim J, Ayalew D (2014) Slope stability susceptibility evaluation parameter (SSEP) rating scheme - an approach for landslide hazard zonation. J Afr Earth Sc 99:595–612. https://doi.org/10.1016/j.jafrearsci.2014.05.004
Sun D, Xu J, Wen H, Wang Y (2020) An optimized random forest model and its generalization ability in landslide susceptibility mapping: application in two areas of Three Gorges Reservoir, China. J Earth Sci 31(6):1068–1086. https://doi.org/10.1007/s12583-021-1433-z
Turkman KF (1985) The choice of extremal models by Akaike information criterion. J Hydrol 82:307–315. https://doi.org/10.1016/0022-1694(85)90023-x
Uguz H (2011) A two-stage feature selection method for text categorization by using information gain, principal component analysis and genetic algorithm. Knowl-Based Syst 24:1024–1032. https://doi.org/10.1016/j.knosys.2011.04.014
Wang JC, Guo ZG (2001) Logistic regression model–methods and applications. Higher Education Press, Beijing (in Chinese)
Wu GH, Xu JJ (2015) Optimized approach of feature selection based on information gain. 2015 International Conference on Computer Science and Mechanical Automation. CSMA, pp 157–161. https://doi.org/10.1109/csma.2015.38
Xie Y (2009) Application of grey relational analysis to the optimal selection of landslide treatment scheme. 2009 ETP/IITA World Congress in Applied Computing, Computer Science, and Computer Engineering. ACC (Proceedings)
Xu C, Xu XW, Dai FC, Saraf AK (2012) Comparison of different models for susceptibility mapping of earthquake triggered landslides related with the 2008 Wenchuan earthquake in China. Comput Geosci 46:317–329. https://doi.org/10.1016/j.cageo.2012.01.002
Yan F, Qiao DY, Qian B, Ma L, Xing XG, Zhang Y, Wang XG (2016) Improvement of CCME WQI using grey relational method. J Hydrol 543:316–323. https://doi.org/10.1016/j.jhydrol.2016.10.007
Yu D, Lv L, Meng F, Gao F, He J, Zhang L (2021) Landslide risk assessment based on combination weighting-improved TOPSIS. IOP Conf Ser Earth Environ Sci 769(3):032022. https://doi.org/10.1088/1755-1315/769/3/032022
Zhang M, Cao X, Peng L, Niu R (2016) Landslide susceptibility mapping based on global and local logistic regression models in Three Gorges Reservoir area, China. Environ Earth Sci 75(11):1–11. https://doi.org/10.1007/s12665-016-5764-5
Acknowledgements
We thank the editors and anonymous reviewers for their valuable comments on the manuscript. We are grateful for the support provided by the Key Research and Development Program of Hubei Province: Research on Disaster Causing Mechanism and Key Risk Prevention Technologies of Major Landslides in Mountainous Areas Under Heavy Rain (No. 2021BCA219).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Consent to publish
I wish to declare on behalf of my coauthors that the work described is original research that has not been published previously and is not under consideration for publication elsewhere, in whole or in part. All the authors listed have approved the enclosed manuscript.
Conflicts of interest
No conflicts of interest exist in the submission of this manuscript, and the manuscript has been approved by all authors for publication.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Li, X., Chong, J., Lu, Y. et al. Application of information gain in the selection of factors for regional slope stability evaluation. Bull Eng Geol Environ 81, 470 (2022). https://doi.org/10.1007/s10064-022-02970-y
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10064-022-02970-y