Background and introduction

Landslides, either triggered by an earthquake or rainfall (García-Rodríguez et al. 2008), pose costly and deadly threats to the mountainous country (Nowicki et al. 2014; Robinson et al. 2017). It subsequently damages the houses and basic infrastructures, economics, and human prosperity (Corominas et al. 2014) and is considered as vital geological hazard in the high mountainous regions (Nefeslioglu et al. 2008). Earthquake alone triggers a lot of landslides (Xu and Xu 2012; Xu et al. 2013a) in the mountainous region (Kamp et al. 2008; Fan et al. 2019) consequently remarked as subordinate hazards of an earthquake (Keefer 1994, 2002; Li et al. 2014). Notably, the 25 April 2015 Gorkha earthquake of Nepal and its aftershock not only responsible for the death of nine thousand people and economic losses of billions of US dollars (GoN 2015) but also triggered more than 25,000 landslides (Roback et al. 2018), which included small-scale rock falls to valley blocking landslides in several areas (Kargel et al. 2016). The landslides obstructed the accessibility to earthquake-affected areas hindering the post-earthquake relief activities (Kargel et al. 2016) multiplying the effects of the earthquake.

Despite, earthquake, an external factor, triggered many landslides (Pourghasemi and Rahmati 2018), several conditioning factors are responsible for occurrence of landslides including topography, geology, and environmental parameters (Süzen and Kaya 2012). Anthropogenic interactions including rapid infrastructure development (Petley et al. 2007) in combination with ground shaking becomes a major triggering factor for landslide occurrence (Süzen and Kaya 2012) in mountainous region (Robinson et al. 2017). In this regards, detailed analysis of earthquake-triggered landslide following an earthquake supplements the knowledge to understand the total impact of the earthquake (Marzorati et al. 2002; Robinson and Davies 2013). Landslide susceptibility mapping (LSM) identifies landslide sensitive areas considering the relationship between causal factors and landslide (Fell et al. 2008), it is a key step for landslide hazard monitoring and mitigation (Pourghasemi and Rahmati 2018). Furthermore, LSM is important for safety planning, disaster management, and future planning of earthquake stuck areas (Xu et al. 2013a).

Many of the developed countries have adopted different technologies to understand the landslide risk of the earthquake which has been lacking in the Nepalese context (Merghadi et al. 2020). Despite several studies have been conducted regarding the landslides and their distribution after an earthquake (e.g., Collins and Jibson 2015; Goda et al. 2015; Gallen et al. 2017; Kargel et al. 2016; Robinson et al. 2017; Roback et al. 2018), there is still a gap in research regarding landslide and its damages due to the earthquake (Kamp et al. 2008) mostly in developing countries including Nepal.

Multiple approaches have been adopted for LSM (Corominas et al. 2014) including heuristic approach (e.g., van Westen et al. 1999; Yalcin and Bulut 2007; Kouli et al. 2010), statistical approach (e.g. Ayalew and Yamagishi 2005; Kamp et al. 2008; Bai et al. 2010; Mancini et al. 2010; Xu et al. 2013a, b), and deterministic approach (e.g. Zhou et al. 2003). Broadly, it can be classified as qualitative and quantitative approach (Ayalew and Yamagishi 2005; García-Rodríguez et al. 2008; Corominas et al. 2014). In the heuristic analysis, the weight assignment is based on the expert opinion (Fell et al. 2008; Xu et al. 2013b; Corominas et al. 2014) as a qualitative approach (Ayalew and Yamagishi 2005). Both deterministic and statistical methods are quantitative approaches and are based on numerical expression of causal factors and landslides (García-Rodríguez et al. 2008). The statistical method is based on the quantitative analysis of relationships between causal factors and landslide events (Ayalew and Yamagishi 2005; Fell et al. 2008).

Several statistical approaches including bivariate probabilistic (e.g. Magliulo et al. 2008; Yilmaz 2009), multivariate as logistic regression (LR; e.g. Dai and Lee 2003; Ayalew and Yamagishi 2005; Lee and Pradhan 2007; García-Rodríguez et al. 2008; Kamp et al. 2008; Bai et al. 2010; Mancini et al. 2010; Pradhan and Lee 2010; Xu et al. 2013a; Aditian et al. 2018), artificial neural network (ANN; e.g. Nefeslioglu et al. 2008; Yilmaz 2009; Pradhan and Lee 2010; Bui et al. 2012; Aditian et al. 2018), and support vector machine (SVM; e.g. Yao et al. 2008; Xu et al. 2012; Gautam et al. 2021) are commonly used for LSM. However, the consensus regarding best suiting method for mapping has yet to be established (García-Rodríguez et al. 2008; Corominas et al. 2014). Among them, LR can define the multivariate relationship of causal factors and landslide (Ayalew and Yamagishi 2005; Lee and Pradhan 2007) which logically can assign the weight depicting the most influencing causal factor as well as disregarding the less significant factors during the mapping process (Budimir et al. 2015). Hence, it produces more reliable results (Nandi and Shakoor 2010). It is for that reason, LR approach is considered to be more suitable and is common in LSM (Nandi and Shakoor 2010; Budimir et al. 2015) especially on regional scale (e.g., Mancini et al. 2010; Pradhan and Lee 2010; Xu et al. 2013a; Corominas et al. 2014).

In the wake of Gorkha earthquake (Mw 7.8 at the epicenter of N 28.1470° and E 84.7080°—Fig. 1), we studied the earthquake-induced landslides in Upper Indrawati Watershed of Nepal which is severely affected by the earthquake. The study area is in the North of the small town called Melamchi (see in Fig. 2), where the effects due to the quake were reported severe in the study by Goda et al. (2015). Thus, our research aims to assess the impact of causal factors for the occurrence of landslides and susceptibility mapping to identify landslide sensitivity of the study area. We used descriptive statistics and logistic regression approach integrating with GIS for LSM (e.g., García-Rodríguez et al. 2008; Kamp et al. 2008; Bai et al. 2010; Aditian et al. 2018). This study could also be replicated in other catchment areas of high mountainous regions for susceptibility mapping.

Fig. 1
figure 1

Map of study area (Indrawati watershed): a MCT (modified after Dahal 2006), earthquake epicenter (at Main Himalayan Thrust), and b rainfall-induced landslides adopted from Gautam et al. (2021) and earthquake-induced landslides along with normal fault and thrust

Fig. 2
figure 2figure 2

a Landslide causative factor's map of the study area. b Landslide causative factor's map of the study area

Study area

This research is conducted in Upper Indrawati Watershed (85°33′N–85°44′N; 27°49′E–28°07′E) of Sindhupalchok district, located 40 km north-east of Kathmandu, Nepal (Fig. 1). We did not analyze the area with an altitude of more than 4000 m because of the lack of geological data (no detailed data available in the Department of Mines and Geology, Nepal), and unavailability of qualitative satellite images because of cloudy weather condition and snow-capped areas. Therefore, the altitude of the area we analyzed ranges from 796 to 4000 m with a coverage of 364 km2 area. The slope gradient ranges from 0° to 73° with the mean slope of 31.6°.

The study area belongs to higher Himalayan zone (Pre-Cambrian) geology dominated by Sermanthang formation and Dhad Khola gneiss formation covering 35% and 24% area respectively (DMG 2021). Sermanthang formation mostly covers the high-altitude region of the study area which includes lithology of interbedded feldspathic schist, augen gneiss, quartzite, and biotic-feldspathic schist. However, Dhad Khola gneiss covers the lower altitudinal region including porphyroblastic gneiss, augen gneiss with a thin band of quartzite and schist, and migmatitic gneiss lithology. The study area located in between the Main Central Thrust (MCT) which passes throughout the Himalayan range in Nepal (Dahal 2006) and is a part of Main Himalayan Thrust where the Gorkha earthquake of magnitude Mw7.8 occurred on April 24, 2015, followed by hundreds of aftershocks including Mw7.3 at ~ 120 km east of the epicenter (Sakai et al. 2016—Fig. 1). These tremors increased the earthquake severity considerably in the study area (Goda et al. 2015) which amplified likely occurrences of landslide. The study area is located nearly 30 km east of the MCT that traced throughout the Himalayan region and (3–6) km northwest of another local MCT (Dhital et al. 2002; DMG 2021) (Fig. 1). A normal fault (southern side of the study area as in Fig. 1) and local thrusts (reverse fault) are located within the study area (DMG 2021).

Landslide distribution

Gorkha earthquake followed by hundreds of aftershocks (Parameswaran et al. 2015) triggered many landslides in the study area. Landslide inventory is key to understand the mechanism of landslide occurrences and preparation of landslide hazard map (Fan et al. 2019). Field observation, on-screen detection of aerial photographs, visual interpretation, and on-screen digitization of high-resolution satellite images are effective and applicable for co-seismic landslide detection (Keefer 2002; Harp et al. 2011; Corominas et al. 2014; Xu et al. 2015). This is because the number, area, and volume of coseismic landslides are extensive (e.g., Keefer 1994; Marc et al. 2016), and landslides location are spatially distributed over the larger areas (Dai et al. 2011). Even though its time consuming, it is more accurate than point-based on susceptibility mapping (Xu et al. 2012; Fan et al. 2019). Thus, we conducted onscreen digitization of post-earthquake satellite imageries available on Google Earth after the Gorkha earthquake and its aftershock in polygon format. We avoided the inclusion of rainfall-induced landslides in the study by comparing the pre-and post-earthquake satellite imageries (Roback et al. 2018). In doing so, the inventory of rainfall-induced landslides of the Gautam et al. (2021) were overlaid in the Google Earth and only the landslide triggered by ground shaking were digitize manually on Google Earth satellite imageries from 2nd to 25th May 2015 (Fig. 1). We mapped 402 co-seismic landslides covering total area of 2.75 km2 in polygon format having an average area of 6891.83 m2 with minimum and maximum area of 169.62 m2 and 158,313 m2, respectively. Furthermore, it was aided by field observation from Nov. 15 to Dec. 5, 2015.

Landslide distribution was expressed by landslide relative density (LRD) (Ayalew and Yamagishi 2005), landslide number abundance (LNA) (Keefer 2000; Xu et al. 2015), and landslide area abundance (LAA) (Dai et al. 2011; Xu et al. 2014, 2015) which helps to understand the effects of causal factors on landslide occurrences. LRD is the ratio of the frequency ratio value (FRV) of a class to the total FRV of that causal factor and is calculated as suggested by Ayalew and Yamagishi (2005). LNA is the number of landslides per square kilometer, and LAA is the percentage of area affected by landslides (Xu et al. 2015).

Landslide causal factors

The occurrence of landslides depends upon multiple causal factors. Even though seismic shaking is the driving force for the earthquake-triggered landslides, local (natural and anthropogenic) factors play a dominant role (Kamp et al. 2008; Fan et al. 2019). Selection of the appropriate causal factors is essential for susceptibility mapping (Xu et al. 2013a; Merghadi et al. 2020), and there are not any agreed criteria (Budimir et al. 2015). Casual factors should be measurable, varies spatially (Ayalew and Yamagishi 2005; Budimir et al. 2015), and have a certain degree of compatibility with the dependent variable (García-Rodríguez et al. 2008). Hence, we selected 13 factors for mapping which includes slope angle, aspect, elevation, geological formation, land cover, distance to river, distance to road, soil type, total curvature, seismic intensity (PGA), topographic wetness index (TWI), distance to fault, and flow accumulation (Table 1). All the causal factors except categorical variables were categorized from four to ten classes to understand their effect for landslide occurrences due to earthquake (Table 1), whereas we preserved the category of all the categorical variables (aspect, formation, land cover, soil type, and PGA).

Table 1 Causal factors and their significance for landslide occurrence with the references

The categorical factors were dealt by creating dummy variables whereas the non-categorical factors with continuous data were dealt as they are (Nefeslioglu et al. 2008) for the purpose of susceptibility mapping (Fig. 2). As the scales of the input variables (casual factors) are different, the input data was normalized from 0 and 1 in order to increase the speed and accuracy of data processing, using Eq. 1 following the study of Nefeslioglu et al. (2008).

$${\text{X}}_{{{\text{norm}}}} = \frac{{{\text{X}}_{{\text{i}}} - {\text{X}}_{\min } }}{{{\text{X}}_{\max } - {\text{X}}_{\min } }}.$$
(1)

where Xnorm is normalized value of Xi input data, Xi is the input data that should be normalized, Xmax is the maximum value of the input data, and Xmin is the minimum value of the input data.

Preparation of training and validation landslide

All the inventoried landslides were divided as training landslide and validation landslide for LR modeling to perform susceptibility mapping and validating to access the quality of the model, respectively (Aditian et al. 2018; Ba et al. 2018; Huang and Zhao 2018). As there are no agreed criteria for the selection of training and validation landslides (Xu et al. 2013b), most of the researcher employed a random selection approach for susceptibility mapping (e.g., Ayalew and Yamagishi 2005; Mancini et al. 2010; Chen et al. 2016; Aditian et al. 2018; Pourghasemi and Rahmati 2018; Huang and Zhao 2018). Hence, we employed random selection approach with a ratio of 80:20 for the training and validation landslides (e.g., Bai et al. 2010; Gautam et al. 2021) (Fig. 3).

Fig. 3
figure 3

Distribution of training landslides and validation landslides

All the mapped 402 co-seismic landslides polygon were rasterized in GIS platform in 12.5 m pixels for analysis, resembling the causal factors resolution as determined by DEM. Among 17,554 pixels of landslide, 14,043 (80%) were used as training data to train the model whereas 3511 pixels (20%) were used to validate the model. As LSM needs the representation of no landslide areas, the equal number of landslides absence pixel (0) from no-landslide zone of the study area for both training and validation datasets were selected randomly (Xu et al. 2013a).

Landslide susceptibility mapping

We used LR, a multivariate statistical approach for LSM in the study area (eg., Ayalew and Yamagishi 2005; García-Rodríguez et al. 2008; Bai et al. 2010; Mancini et al. 2010; Xu et al. 2013a; Abedini et al. 2017; Aditian et al. 2018). LR is also called as generalized linear model (GLM) for binary response variables (Hosmer and Lemeshow 2000). It allows describing the effect of all the independent variables on dependent variables in the form of a linear regression equation (Atkinson and Massari 1998). It has a distinct advantage in which the independent variables can be either continuous or (and) discrete as if the link function is added to usual linear regression model (Bai et al. 2010), and not necessarily needed to normally distributed (García-Rodríguez et al. 2008; Bai et al. 2010). LR analysis is based on the analysis of dependency of a binary dependent variable having possible outcome of either 1 (landslide) or 0 (no landslide) to independent variables (landslide causal factors) (Mancini et al. 2010; Budimir et al. 2015). It calculates the probability of specific event occurrences, landslide occurrence in this case as in the study by Ayalew and Yamagishi (2005) which can be expressed as in Eq. (2).

$${\text{P}} = 1/(1 + {\text{e}}^{{ - {\text{Z}}}} )$$
(2)

where P is the probability of landslide occurrence, e is the exponential function and Z is the logit value which is expressed by a linear equation as;

$${\text{Z}} = {\text{b}}_{0} + {\text{b}}_{1} {\text{x}}_{1} + {\text{b}}_{2} {\text{x}}_{2} + \cdots + {\text{b}}_{{\text{n}}} {\text{x}}_{{\text{n}}}$$
(3)

where b0 is the intercept, b1, b2,…bn are the coefficient of landslide causal factors x1, x2,…xn respectively, n is the number of causal factors. The linear model is logistic regression and represents the presence and absence of landslides on independent variables (Bai et al. 2010).

A data matrix is devised including 40 variables with eight continuous and 32 classes of five categorical variables in binary format to perform the LR. The data were arranged in such a way that each row datum represents an individual case expressed in pixel, and columnar data show the dependent and independent variables (Gautam et al. 2021). The "glm" package in R 3.5.2 was employed for GLM (Team 2019). Provided results were the coefficient of causal factors by the process of maximum likelihood criterion (Mancini et al. 2010).

Susceptibility scale of the region is relative (Fell et al. 2008) hence, obtained probability value were categorized in five discrete classes as susceptibility categories (very low, low, medium, high, and very high) using the natural breaks (Jenks) method in ArcGIS (e.g., Chalkias et al. 2014; Aditian et al. 2018; and Gautam et al. 2021).

Map validation

The overall performance of the mapping approach is assessed by identifying the correctly classified landslides under the susceptibility categories of LSM (Xu et al. 2013b). The area under the curve (AUC) value was calculated to predict the accuracy of the model quantitatively (Akgun 2012; Abedini et al. 2017; Ba et al. 2018; Gautam et al. 2021). The AUC value suggests the model's quality for reliable prediction of the existence and non-existence landslides (Aditian et al. 2018). The larger the AUC values, the higher the accuracy of the model's performance (Corominas et al. 2014). Additionally, the values close to 1.0 indicating perfect fit is the ideal condition whereas, close to 0.5 indicates the random fit (Carvalho et al. 2014).

We compared the LSM with both training and validation landslides (Gautam et al. 2021). The curve obtained by overlaying susceptibility map with training landslide suggests the model's capability of classifying the area (Mancini et al. 2010; Xu et al. 2012) and also called as success rate curve (Chung and Fabbri 2003; Aditian et al. 2018; Ba et al. 2018); and with validation landslide, it is called prediction rate curve (Lee and Pradhan 2007; Ba et al. 2018) suggesting the model's ability of landslide prediction (Chung and Fabbri 2003; Aditian et al. 2018). The AUC value generated from the success rate curve also indicates the correlation between dependent and independent variables in LR analysis (Mancini et al. 2010).

Furthermore, we used seed cell area index (SCAI) to validate the susceptibility map (Suzen and Doyuran 2004; Abedini et al. 2017; Nicu and Asăndulesei 2018). The SCAI illustrates the landslide density on susceptibility class (Suzen and Doyuran 2004) and is the ratio of the percentage of the area in the susceptibility category to the percentage area of landslides in that category (Abedini et al. 2017). The SCAI reflects the accuracy of the mapping approach qualitatively (Abedini et al. 2017). In an ideal situation, the SCAI value is decreasing form very low to very high susceptibility classes (Kincal et al. 2009; Sdao et al. 2013; Chen et al. 2016; Abedini et al. 2017; Nicu and Asăndulesei 2018).

Landslides distribution

Landslides in the study area were clustered within the Northern part (Fig. 4). This is because of the dominated steeper slope and higher elevation topology in the northern region. Most of the earthquake-induced landslides are rock and debris falls following the category of Varnes (1978) classification (Owen et al. 2008). We found an adequate number of rockfalls in the study area as the landslide concentration is higher in the northern part which is subjugated by the rocky slope. In this study, we did not include the debris flows, following the study of Varnes (1978) as the mechanism of debris flow is entirely different from other types of mass movement. Our study support that the distribution of earthquake-induced landslides and rainfall-induced landslide is comparable as earthquake-induced landslides is clustered along the sharper slope, whereas the rainfall-induced landslides distributed evenly throughout the flatter slope where higher anthropogenic pressure is observed (Gnyawali et al. 2020; Gautam et al. 2021).

Fig. 4
figure 4

Landslide density (number [#] of landslide per km2) of the study area due to the Gorkha earthquake of 2015

We assessed the concentration of co-seismic landslides in this study corresponding to the study of Kargel et al. (2016). We categorized the area into eight categories considering the landslide number per km2. Nearly 40% of the area is not affected by earthquake-induced landslides; with a frequency less than 0.5/km2 (Fig. 4). Only 5% of the area had landslide density (expressed regarding landslides number/km2) of more than 40 (Fig. 4), indicating the higher landslide zone. However, the area having landslides 10–20 per km2 and 20–40 per km2 covered the considerable area with 11% and 7% respectively.

The LRD, LNA, and LAA of the classes of causal factors are presented in Fig. 5a–m. LRD, LAA, and LNA are increased with slope angle reaching a maximum of 44%, 2.97%, and 4.34/km2 respectively at a slope class more than 50 degree (Fig. 5a). The LRD, LAA, and LNA of southern aspect (southeast, south, and southwest) were higher with the value reaching maximum of 21%, 1.25%, and 1.83/km2 respectively (Fig. 5b). The relationship of elevation and landslide frequency showed that the higher LRD (= 31%), LAA (= 1.15%) and LNA (= 1.68/km2) were found in the elevation range (2400–3000) m and were decreased in either lesser or higher elevation classes (Fig. 5c). Both concave and convex curvature had higher landslide distribution with slightly more elevated in concave curvature reaching a maximum value of LRD (= 25%), LAA (= 1.79) and LNA (= 2.61/km2) in curvature class (= -34.56 to -3.99) among the 6 classes generated in ArcGIS using natural breaks method (Fig. 5k).

Fig. 5
figure 5

Landslide distribution according to the classes of causal factors (a slope [< 10, 10–20, 20–30, 30–40, 40–50, > 50], b aspect [North, NE, East, SE, South, SW, West, NW], c elevation [794–1300, 1300–1800, 1800–2400, 2400–3000, 3000–3600, > 3600], d geological formation [Sermanthang, Simpani, Hadi Khola, Dhad Khola, Gyalthung, Pangang], e land cover [Needle leaved closed forest, Needle leaved open forest, Broad leaved closed forest, Broad leaved open forest, Shrubland, Grassland, Agriculture], f distance to the river [< 100, 100–200, 200–400, > 400], g distance to road [< 678, 678–1733, 1733–2869, 2869–3965, 3965–5216, 5216–6573, 6573–7930, 7930–9338, 9338–10955, 10955-13303], h soil type [Humic cambisols, Gelic leptisols, Eutric regosols, Gleyic cambisols, Eutric cambisols, Chromic cambisols], i peak ground acceleration, [< 20%, 20–30%, 30–40%, 40–50%] j topographic wetness index [0–88–4.12, 4.12–5.38, 5.38–6.85, 6.85–9.20, 9.20–18.00], k curvature [–34 to –3.99, –3.99 to –1.37, –1.37 to 0.38, 0.38 to 2.41, 2.41 to 6.20, 6.20 to 39.39], l distance to fault [< 1088, 1088–2520, 2520–4180, 4180–6070, 6070–8188, 8188–10707, 10707–14544], and m flow accumulation[< 220, 220–329, 329–549, 549–993, 993–1885, 1885–3618, 3618–7297, 7297–14575, 14575–29225, 29225–58714]). The bar graph represents the LRD in percent shown in the primary y-axis. The dotted line with round marker represents LAA (%), and line with triangular marker represents LNA (landslide #/km2); depicted in the secondary y-axis

Pangang formation of higher Himalayan zone (Pre-Cambrian) is found in the northernmost area of the watershed (Fig. 3), which has higher landslide distribution as LRD (= 41%), LAA (= 1.71%), and LNA (= 2.50/km2) followed by Sermanthang formation (Fig. 5d). Hence, gneiss, schist and quartzite lithology of the higher Himalayan group were highly susceptible due to the earthquake-induced landslide in the study area. Humic cambisols, among the 6 category of soil types found in the study area, has higher distribution of landslides with LRD, LAA, and LNA value of 34%, 1%, and 1.45/km2 respectively followed by Eutric regosols (Fig. 5h).

Distribution of earthquake-induced landslides in the open forest is higher in the study area followed by shrubland reaching the maximum value of 31%, 1.16%, and 1.70/km2 for LRD, LAA, and LNA respectively (Fig. 5e). Landslide frequency in the closed forest was also considerably higher compared to agricultural land in the study area. The reason behind this could be most of the forested area is distributed throughout the higher slope gradient as we overlaid the land cover class with slope gradient.

Among the four classes of distance to the river, class (100–200) m has higher landslide distribution with the value of 38%, 1.11%, and 1.62/km2 for LRD, LAA, and LNA respectively followed by the class (200–400) m (Fig. 5f). Landslide distribution is increasing with increasing distance to road until 6500 m, reaching LRD (= 21%), LAA (1.76%), and LNA (= 2.56/km2) and is decreasing further (Fig. 5g). In case of causal factor distance to the fault, class (3.3–4.2) km out of ten classes generated in ArcGIS (natural breaks method), have higher LRD, LAA and LNA with respective value of 50%, 3.03% and 4.43/km2 with gradually decreasing on either side of the classes (Fig. 5l).

As we were mapping seismic induced landslides susceptibility, we considered seismic intensity, which varied spatially throughout the study area, as a causal factor. LRD (= 42%), LAA (= 0.99%) and LNA (= 1.45/km2) were found higher in the area which had low seismic intensity (20% peak ground acceleration, PGA) followed by the PGA class 50% (Fig. 5h) among the four classes.

The distribution of landslides were gradually decreasing with increasing value of TWI (Fig. 5j). The TWI class with lesser index ranging from 0.9 to 4.1 has the highest LRD (= 29%), LAA (= 0.96%), and LNA (= 0.88/km2) suggesting that the chances of landslide occurring within that class is higher. We found similar trend with flow accumulation (Fig. 5m).

Landslide susceptibility modelling

All together 19 variables of causal factors slope, distance to fault, TWI, aspect, landcover, soil type (one class), and PGA (one class) show the positive association for landslide occurrences in the study area among the 40 variables of 13 causal factors (Table 2). We found that landcover (except bare area) classes and slope were the best predictors for landslide occurrences with the coefficient value of >10 and 5.35, respectively.

Table 2 Logistic regression (LR) coefficients of landslide causal factors indicating the relative significance for landslide occurrence, and constant (Intercept)

Even though many of these landslides would not occur any time soon without the earthquake shaking, we found that only two classes of PGA (PGA 20% = 0.66 and PGA 40% = -2.64) have shown association (statistically) for landslide occurrence in the study area. The negative association on PGA class 40% is the second most important variables having negative association for landslide occurrences after elevation (− 4.37).

The probability values of landslide occurrences in each pixel were ranges from 0.001 to 0.987. The final susceptibility map with susceptibility categories–very low, low, medium, high, and very high–is presented in Fig. 6. The area coverage according to the five susceptibility categories are 50.16% (182.76 km2), 22.55% (82.18 km2), 13.79% (50.24 km2), 9.34% (34.04 km2), and 4.16% (15.17 km2), respectively from very low to very high.

Fig. 6
figure 6

Simplified map of landslide susceptibility due to earthquake depicting the susceptibility classes in the study area

The land having higher slope gradient with elevation ranges from 2400 to 3000 m are highly sensitive for an earthquake-induced landslide in future (Fig. 6). Similarly, Pangang and Sermanthang formation of higher Himalayan (Pre-Cambrian) geology including schist, quartzite and gneiss lithology are highly susceptible. Last but not least, the area having less population density is highly vulnerable to the landslides due to the earthquake.

Map validation

The AUC values of the success rate and prediction rate curves were 0.843 and 0.832 respectively (Fig. 7) suggesting good accuracy. The AUC value of success rate curve suggests that the approach have the capability of 84.3% to classify landslide correctly under the susceptibility categories as well as there exists a good correlation between dependent (landslide) and independent (causal factors) variables. The AUC value of the prediction rate curve represents that LSM approach able to correctly predict future landslides by 83.2%. The SCAI values are decreasing from very low to very high susceptibility classes (Table 3) suggesting the mapping approach produces a susceptible map with satisfactory accuracy.

Fig. 7
figure 7

Success rate (a) and prediction rate (b) curves with AUC values

Table 3 Distribution of landslide and SCAIs in the susceptible categories

Discussion

Understanding overall impact of earthquake hazards require the understanding of several features of co-seismic landslide (Robinson and Davies 2013). It includes the landslide number, area, and volume (e.g., Keefer 1984; Marc et al. 2016) and the location of landslides (e.g., Dai et al. 2011). Hence, we studied the distribution of co-seismic landslide (landslide number, area, and volume and spatial location of landslides) caused by earthquake and employed the landslide susceptibility mapping approach using LR considering several causal factors in a catchment of mountainous area of Nepal.

Landslide susceptibility differs from quake to quake (Kargel et al. 2016). This study concurred with the study of García-Rodríguez et al. (2008) that the landslide density is increasing with the slope gradient. Likewise, the recent study by Kargel et al. (2016) of Nepal found that the earthquake frequencies are higher in the area where slope gradient is more than 300 which is similar to our study. In addition, the topographical variables including slope found to be significant variables for co-seismic landslides in the study by Fan et al. (2021). However, our study contradicts the study from Kamp et al. (2008) as they found that the landslide relative density is higher in slope 25°–35° and decrease on either side. Similarly, the study from Roback et al. (2018) also did not found any correlation between landslide occurrences and steepest slopes. Different surficial geology, mostly dominated by a rocky hill which is brittle and more susceptible for shaking, and land cover in the steepest slope gradient might play an essential role for increasing landslide distribution in steeper slope in this study. This study also concurs with the study of García-Rodríguez et al. (2008) that landslide distribution is low at the lower elevation because of the gentle terrain. Likewise, the landslide distribution with respect to elevation is similar with the study by Roback et al. (2018) where the mid elevation has higher frequencies of landslides. Similarly, our study found an analogous result with Kamp et al. (2008) that southeast, south, and southwest aspect have higher landslides distribution.

As Xu et al. (2015) suggested in their study that only seismogenic faults affect a co-seismic landslide, this study also found the similar result that there is no clear correlation between the distance to the fault line and landslide distribution. The distribution of landslides are random without considering the physical properties of the fault. However, landslide frequencies are increased as the distance from faults line decrease (Xu et al. 2015) because the rock strength weakens due to the fault line (Osmundsen et al. 2009).

We found types of metamorphic rock including schist, quartzite, and gneiss of Pangang formation of the higher Himalayan zone have a higher frequency of landslides which is harmonize with the study by Kargel et al. (2016). The schist of the higher Himalayan zone along with Proterozoic phyllite, amphibolite, meta-sandstone of lesser Himalaya has higher landslide frequency due to the Gorkha earthquake (Kargel et al. 2016). In general, soft sediment (metamorphic rock) usually amplifies the energy of seismic waves (Chamlagain and Gautam 2015) and amplifies frequencies of the landslide. As weaker soil types have low elasticity and can amplify seismic waves and undergo a greater displacement (Hovius and Meunier 2012), our study find the similar result that the delicate soil type of mountainous regions including Humic Cambisols and Eutric Regosols (Driessen 1991) have higher density of landslide.

The landslide abundance is positively correlated with the seismic intensity (Xu et al. 2015) and is considered as the important factor for landslide occurrence due to the earthquake (Wang et al. 2003; Xu et al. 2014). Hence, we used PGA, a triggering factor of earthquake-induced landslides, as a causal factor for landslide study and susceptibility mapping which has been scarcely used in LR based LSM despite it is the reason for earthquake-induced landslide (Budimir et al. 2015). However, as in the study by Xu et al. (2015), Fan et al. (2021), and Roback et al. (2018), our study found that there is no correlation between the PGA and landslide distribution in the study area. The PGA is slightly reduced with steeper slope in our study which coincides with the study by Roback et al. (2018) where they found the modelled PGA of Gorkha earthquake reduces slightly for the steepest terrain and is weakly constrained for landslide occurrences. The landslide frequencies in this study area are higher in lower PGA as in the study by Roback et al. (2018). However, the study by Xu et al. (2014) suggests that the landslide density increases with increasing PGA. Relatively steeper slope and diverse lithology is the significant attributes of the study area where PGA value is lower and has experienced the majority of landside with higher density. Hence, the higher slope, steeper relief, and increased terrain roughness is positively correlated to the seismic landslide distribution (Roback et al. 2018).

Regarding the association of land cover classes and landslide, we found that the forested area has a higher frequency of landslide which is contrasting with the result of the study by Kamp et al. (2008) as they found that grassland and agricultural land have higher earthquake-triggered landslide densities than forest land. The reason for this contrasting result might be dependent with slope distribution as the forested area have higher slope. In addition, temporal change in land cover induced by human activities could be another reason. However, we have not consider the land cover changes from 2010 to 2015 (earthquake date) in our analysis despite landslide occurrences is significantly attributed by the land cover change trajectories as per the study by Guns and Vanacker (2013).

Even though, LSM is the key for hazard mitigation and Nepal ranked among the top of the list with the highest percentage of the landslide reports, it is not listed among the top 18 countries where machine learning (ML) approaches including LR for LSM studies have been used (Merghadi et al. 2020). Hence, we conducted LSM using LR among the different ML approaches available in the literature (Budimir et al. 2015; Merghadi et al. 2020). Among the ML approaches, LR belonged among the 10 most popular and has been considered widely by the researchers (Nowicki et al. 2014), despite several new computing techniques have been in use. In addition, LR is simple and easy techniques and has been popular and widely used in recent times as the integration of GIS in susceptibility mapping has eased the process and improved the effectiveness (Budimir et al. 2015; Huang and Zhao 2018). LR even produced better result than that of high computing techniques including the study by Xu et al. (2012) where LR outperforms ANN, SVM and bivariate statistics methods. In the study by Gautam et al. (2021) LR produced the map with better accuracy than SVM. Similarly, the LR accomplished over 0.9 in AUC values in some studies (e.g., Kavzoglu et al. 2015). Hence, we used LR for LSM in the study of high mountainous region of Nepal.

In LR analysis, the highest coefficient describes the best predictor which depends upon the trends of the landslide distribution within the classes of each causal factor (Mancini et al. 2010). If there are definite trends as in the causal factors: land cover classes and slope of this study; their coefficient is positive and higher, which show the positive correlation for landslide occurrences as suggested by the study from Abedini et al. (2017) and vice versa. We found that the land cover classes were the best predictor among the causal factors followed by slope and distance to fault which align with the study of Mancini et al. (2010) and Roback et al. (2018). However, our study contrast with the result of Xu et al. (2013a): they found slope and distance from the fault has a negative association for landslide occurrence. It is also believed that landslide frequency often decreases with increasing distance to the fault (Xu et al. 2015), however, in this study, there is no clear correlation. Various researcher found different factors as the best predictor in their studies such as slope (Ohlmacher and Davis 2003; Lee and Pradhan 2007; Yilmaz 2009; Xu et al. 2013a; Abedini et al. 2017); soil type (Gautam et al. 2021); distance to road (Ayalew and Yamagishi 2005); terrain roughness and lithology (García-Rodríguez et al. 2008); aspect (Aditian et al. 2018). In the study of Bai et al. (2010), curvature was the best predictor. However, we found that curvature is the less essential factor. Similarly, we found the correlation of PGA in landslide occurrence is weaker compared to other causal factor; even though, the seismic intensity is the reason for land sliding in the region. The triggering factor (either rainfall or earthquake) for landslides (García-Rodríguez et al. 2008) and selection of causal factors could be the reason for such variance in results.

The appearance of the validation landslides in the high susceptibility category determined the performance of the LSM approach which differs among the researchers. Choice of the independent variables (causal factors) (Pradhan and Lee 2010); mapping approaches used (Nefeslioglu et al. 2008); and local factors as well as the method of selecting training and validation landslides (Xu et al. 2012) could result in the varying accuracy of validation in LSM studies. The map validation result of SCAI-decreasing from very low to very high susceptible categories concurs with the study from Kincal et al. (2009), Sdao et al. (2013), Chen et al. (2016), Abedini et al. (2017), and Nicu and Asăndulesei (2018). The prediction AUC value of 83.2% in this study suggest that the LR approach predict the future landslides with satisfactory accuracy. Performance of our study resembles the study by Merghadi et al. (2020) on LSM using LR among the other ML approaches which produced the LSM by LR with satisfactory accuracy (AUC values more than 80%). Considering its simplicity, easiness, highly interpretable capacity, quick processing ability, and ability to calculate probability directly, the LR approach is the best option among the different ML approaches (Merghadi et al. 2020) and produced a result with more than acceptable accuracy (as in our study).

Concluding remarks

To realize the overall impact of the earthquake in the high mountainous regions of Nepal, we studied the landslide distribution, its relationship with causal factors (13 factors), and landslide susceptibility due to co-seismic landslides in Upper Indrawati watershed which is highly impacted by Gorkha earthquake and its aftershocks. It is a critical aspect of hazard mitigation (Merghadi et al. 2020).

The study provided the insight that the landslide occurrences show the apparent positive correlation with the slope as landslide abundance increased with slope angle. The causal factors: fault distance, slope, aspect, landcover, a class of PGA, and soil type showed some trends for landslide occurrences resulting in the positive coefficient in LR analysis. From the study, the region belongs to elevation range (2400–3600) m, higher slope, and forested area are highly susceptible to landslides due to the earthquake. Most of such region belongs to the northern part of the study area where lithology of Pangang formation is dominant and have relatively low population density.

The LR approach used for LSM in the study area shows the good accuracy with AUC value for success and prediction rate curves of 0.843 and 0.832 respectively, and SCAI values higher in very low susceptible category and are descending toward very high susceptible category. Hence, the map could be used for slope improvement and generalized planning of the study area. As the rehabilitation and development of the area has been ongoing, this map could be an asset for the managers including government and private sectors. Site-specific consideration of local factors need to be addressed before implementation. This study added an extra value to the literature for the much-needed clarity and consistency in methodology for selection of causal factors for LR analysis, despite LR has been widely used for LSM (Budimir et al. 2015).