1 Introduction

The phenomenon of climate change plays a pivotal role in shaping the redistribution of species' ranges. It can be posited that alterations in climatic conditions may potentially engender the expansion or contraction of the geographic ranges of said species [1]. Compounding the concerns at hand are the anticipated ramifications of the phenomenon of global warming, which are projected to intensify over the course of the ensuing half-century [2]. Numerous research studies have indicated that contemporary ecological changes, which are caused by human activities, have led to the displacement of species [3], alterations in phenology [4], and the extinction of certain species [5]. The task of reviving and safeguarding of lesser-known species could prove to be a more arduous undertaking in the absence of sufficient biogeographic data [6, 7]. The integration of biogeography principles, techniques, and evaluations into the realm of conservation biology has significantly bolstered endeavours aimed at preserving the habitats of these particular species.

To comprehend the present and forthcoming path of population dynamics, it is crucial to integrate computational methodologies into predicting the ramifications of climate change on the accessibility of appropriate habitats for a particular species. The utilization of bioclimatic modelling tools has become progressively prevalent in the projection of forthcoming species distributions. The utilization of species distribution models (SDMs) or Ecological Niche Modelling ENM) have become ubiquitous in the scientific community as a means of forecasting the historical, current, and prospective habitat suitability of species through the application of diverse mathematical algorithms [8]. The efficacy of SDMs is predicated upon the interdependence between an organism and its corresponding geographic location, with contemporary models incorporating additional variables such as land-use classifications [9, 10].

Regression and machine learning are two major areas of Species Distribution Modelling (SDM) that rely on presence-absence data [11]. Constraints inherent to any model reduce their ability to anticipate [12]. An over-adjusted model has a lower likelihood of success when used with new data [13]. In contemporary times, novel modelling algorithms have been proposed that advocate the utilization of ensemble modelling. This approach facilitates the identification of the common ground between diverse models, thereby enabling the advantages of employing multiple algorithms while simultaneously revealing uncertainty to the end-users. This has been elucidated in various studies [6, 11, 14,15,16,17,18,19,20].

Indigofera oblongifolia is a valuable but underutilized multipurpose leguminous shrub that has traditionally been used for foraging, medicine, herbal tooth-brushing and restoration of mining degraded lands [21,22,23]. It is a non-thorny erect arid shrub found in open, dry areas with stable sandy soils. It is widely available in Jordan, Yemen, Baherien, Eritrea, Somalia, Egypt, Sudan, Senegal, Angola, Nigeria, Arabia, Baluchistan, Pakistan, Java, Sri Lanka, and India [24, 25], as well as Australia, North and South America [26]. This species can be flourish under the wild conditions (Fig. 1a), under agrarian settings (Fig. 1b) having small flower with distinct petal (Fig. 1c) with small pods (Fig. 1d). Despite the extensive range of ecosystem services and wide niche breadth exhibited by this species in hot arid and semi-arid regions across the globe, a dearth of scientific inquiry has been conducted to model its ecological niche utilizing predictors such as present and future bio-climatic changes, as well as non-climatic predictors like the impact of human activity on terrestrial ecosystems and soil nutrients, specifically nitrogen and phosphorus, from a global vantage point.

Fig. 1
figure 1

Indigofera oblongifolia: under wild condition (a); under agrarian condition (b); at flowering stage (c) and at fruiting stage (d)

Therefore, we used global presence records of I. oblongifolia shrub to achieve the following goals: (a) identify the global geographical distribution patterns of this species with climatic change projections across three bio-climatic timeframes (current, 2050, and 2070) and four greenhouse gas scenarios (RCPs 2.6, 4.5, 6.0 and 8.5), (b) to evaluate the effects of the existing global livestock population (cows, goats, and sheep), global fertilizers application (nitrogen and phosphorus), and the impact of anthropogenic variables in terms of human modification of the terrestrial ecosystem on the habitat suitability (c) to evaluate the degree of indigenous (percent indigeneity) by using the geographical area, habitat suitability categories, and number of polygons identified in the preceding first two steps, (d) to quantify the temporal effects of different bio-climatic variables on its fundamental and realized niche. The aforementioned objectives shall serve as a guiding principle in the estimation of the actual area that is conducive to the survival of this species across various habitat classifications. Additionally, the identification of factors and thresholds that influence the current and future distribution patterns of this species shall be undertaken. Consequently, it shall enable us to determine the feasibility of executing comprehensive captive plantations to ensure the perpetual survival of the species and perpetuate the ecological advantages it presently confers.

2 Material and methods

2.1 Data collections

Distributional records for this species were obtained from data repositories such as the Global Biodiversity Information Facility GBIF https://doi.org/https://doi.org/10.15468/dl.7mf5jt [27], the Indian Biodiversity Portal (https://indiabiodiversity.org/species/show/279391), and published literature [23, 28, 29] and our field work during 2005 to 2014 at various districts of arid and semi-arid areas of Rajasthan, India [30, 31]. The coordinates of these sites were identified on a WGS84 coordinate system with GIS ArcMap [32] software. In order to lessen the effects of spatial autocorrelation and eliminate redundant entries, we employed the spatial thin window of the “Wallace Software”, a Graphical User Interface based on the R programming language [33], with a thinning distance of 10 km. Utilizing the spatially dispersed presence points of said species, we have ascertained its present IUCN classification, with particular emphasis on its Area of Occupancy (AOO) and Extent of Occurrence (EOO) through employment of the R-based “ConR” programme.

2.2 Bio-climatic (BC) and non-bioclimatic variables

WorldClim version 2.0 [34] provided the observational data used to predict the present and future distributions of species. 19 bioclimatic variables [35] were downloaded at spatial resolution of 30 s (~ 1 km2) and converted to ASCII (or ESRI ASCII) in DIVA-GIS version 7.5 [32] for current as well as two future climatic scenarios (2050-time frame that represents the mean values from 2041 to 2060 and 2070-time frame that represents the mean values from 2061 to 2080 [36]. These datasets pertain to four Representative Concentration Pathways (RCPs), namely RCP 2.6, RCP 4.5, RCP 6.0 and RCP 8.5. RCPs are the spatial and temporal trajectories of future greenhouse gas concentrations and pollutants induced by various human activities. The quartet of Representative Concentration Pathways spans a spectrum from exceedingly elevated (RCP 8.5) to exceedingly diminished (RCP 2.6) forthcoming concentrations. The pathway with the most minimal concentration of greenhouse gases (GHG) is denoted by RCPs 2.6, which is characterized by aggressive mitigation and the lowest emissions. Conversely, RCP 4.5 and RCP 6.0 are classified as intermediate, while RCP 8.5 represents the maximum emission scenario. GHG concentration pathways in which radioactive forcing (global energy imbalance) stabilizes 2.6 W/m2, 4.5 W/m2, 6.0 W/m2and 8.5 W/m2, respectively, [37]. Details of each bio-climatic parameter, along-with their units and mathematical expressions see Table S1.

2.3 The global human modification of terrestrial systems (GHMTS)

The Global Human Modification of Terrestrial Systems offers a comprehensive depiction of the worldwide impact of human activity on all landmasses, with the exception of Antarctica, at a resolution of 1 square kilometre. The metric in question is a continuous scale that ranges from 0 to 1, and serves to quantify the degree to which a given landscape has undergone modification. This is achieved through the use of sophisticated modelling techniques that take into account the physical extents of 13 distinct stressors that are attributable to human activity. These stressors are evaluated using spatially explicit global data sets that are current as of the median year of 2016. Detailed description of the methodology is provided by Kennedy et al. [38, 39]. For present study, the global GHMTS data set in Geo Tiff format was downloaded from Socioeconomic Data and Applications Center (SEDAC) https://sedac.ciesin.columbia.edu/data/set/lulc-human-modification-terrestrial-systems/data-download with the intention of evaluating the effects of these human-caused changes on the species' distribution dynamics.

2.4 Global fertilizer (nitrogen and phosphorus and manure production) and livestock population

The global mapping of nitrogen and phosphorus fertilizer application rates (kilograms/hectare), as documented by Potter et al. [40], provides a valuable resource for the determination of nutrient budgets, modelling the impact of nutrient management on crop growth and yield, and tracing the trajectory of nutrients from their extraction and processing from minerals to their eventual application in the agricultural sector. The nutrient application rates for nitrogen (N) and phosphorus (P) in both fertilizer and manure production were obtained from the dataset disseminated by the Socioeconomic Data and Applications Center (SEDAC). http://sedac.ciesin.columbia.edu/data/collection/fertilizer-and-manure.html. In these datasets the mean quantity of nitrogen and phosphorus fertilizer administered across all crops situated within the 0.5-degree grid cell. The values of the grid cells are quantified in units of kilograms per hectare (kg/ha), with a range spanning from 0 to 370. In the current study, the aim is to assess the worldwide dispersion tendencies of said species in correlation to the global prevalence of fertilization practices and the generation of organic waste.

2.5 Grided livestock of the World-2015

The dataverse encompasses a spatial dataset ((~ 1 km2) pertaining to the distribution of livestock, which holds potential for enhancing development, environmental quality, and health outcomes. The methodology employed in the present study involved the retrieval of sheep, goat, and cattle population densities from the online database https://data.apps.fao.org/catalog/dataset/glw (January, 2023) at 5 min arc, in accordance with the approach outlined by Gilbert et al. [41].

2.6 Issue of multicollinearity

The present study utilized the Pearson Correlation Coefficient (r) to scrutinize cross-correlation, and a multicollinearity examination was executed to evaluate the potentiality of over-fitting. In addition, we adhered to the methodology outlined by [42] to eliminate variables that displayed cross correlation coefficient values that were equal to or greater than 0.85. This was accomplished through the utilization of the Niche Tool Box [43, 44]; https://github.com/luismurao/ntbox). A singular variable, which exhibits substantial cross-correlation and holds biological relevance to the species, was selected from a set of two alternative variables for the purpose of simplifying model interpretation [20, 45].

2.7 Projection transformation

Due to the disparate origins and resolutions of the Bio-Climatic (BIO) and Non-BIO variables, it is imperative to rectify their projections prior to data extraction and the subsequent prediction of the ensemble model. The achievement of this study was facilitated by employing a sequence of procedures in ArcMap utilizing ArcToolbox. Initially, the projection was established within the "projection and transformation" sub-window of Data Management Tools. The WGS 1984 EASE Geographic Coordinate System (GCS) was employed for this purpose. In order to accurately measure the extent of each habitat suitability class, it is necessary to utilize a Projected Coordinate System (PCS) within the "calculate geometry" function of Arc Map [46]. As a result, we have converted the projections of the habitat class raster file to WGS 1984 web Mercator (auxiliary sphere-3857). This step facilitates the computation of the area encompassed by a particular class, using a unit of measurement specified by the user (in this case, square kilometres were employed).

2.8 Ensemble species distribution modelling

For ensemble species distribution modelling, following algorithms from the Stacked Species Distribution Models (SSDM) R package [47] were used: Generalized Linear Models using Gaussian distribution “GLM”, Generalized Additive Model “GAM” [48], Support Vector Machines “SVM”, Random Forest “RF” [49, 50], Multivariate Adaptive Spline “MARS”, and Maximum Entropy-Maxent v. 3.4.1 [51, 52]. Artificial Neural Network “ANN” [53], and Classification Tree Analysis “CTA” [54]. We used the default for all studied algorithms [20]. We built the models using 70% of the data (training set) and evaluated the model performance with the rest of the 30% of the data evaluation set [43]. We built the final models with all studied predictors and RCPs using weighted mean separately. For more accurate predictions, models were evaluated using K-fold cross-validation with 10 folds and 10 replications for each algorithm and other settings for different algorithms were used in addition to the default option [55]. In order to reconcile the varying strengths and weaknesses inherent in each algorithm, SSDM undertook the process of averaging multiple model evaluation criteria. This resulted in the declaration of model qualities through an ensemble scoring system [56] that incorporated the following metrics. Area Under the Curve (AUC) of receiver operating characteristics (ROC), true skills statistics (TSS) and Kappa [57]. The criteria for Kappa and TSS values were: excellent, 0.85 to 1; very good, 0.7 to 0.85; good, 0.55 to 0.7; fair, 0.4 to 0.55; and fail, less than 0.4 [2, 58]. When evaluating the AUC statistic, the following grades are used: exceptional (0.90–1.00), very good (0.8–0.9), good (0.7–0.8), fair (0.6–0.7), and poor (0.5–0.6).

2.9 Post distribution analysis

2.9.1 Habitat suitability

The ASCII raster outputs derived from each SDM analysis, utilizing both bio-climatic and non-bioclimatic predictors as previously mentioned, were imported into ArcMap. The cell values, ranging from 0 to 1, were then employed to effectively categorize the habitats for this particular species. Utilizing the constant point break standards for each classification as outlined by Khan et al. [59], we have partitioned the area into four discrete groupings predicated on their degree of suitability: optimal, moderate, marginal, low, and absent/in-appropriate. We employed the raster calculator (Spatial Analyst Tool/Map Algebra/Raster Calculator) methodology to ascertain the total area, measured in square kilometres, that pertains to above suitability classes. We have conducted computations to determine the ratio of every category of habitat in relation to the percentage of the global overall landmass (PTGLA), employing a diverse range of predictors. By utilizing the Variable Importance Percentage (VIP) metric derived from each analysis, we have successfully discerned the preponderant influence of bioclimatic and non-bioclimatic predictors on the distributional pattern of this species [58].

The output of the global distribution of I. oblongifolia is subjected to additional filtration through the computation of the area and corresponding PTGLA values under the four aforementioned suitability classes. This is performed at both the continental (Africa) and national (India) levels. The principal aim of this exercise was to ascertain a quantitative measure for the Percent Indigeneity (PI) of the species, utilizing a recently suggested index that assigned weights to the various habitat suitability categories in the following manner:

$$\mathrm{Percent\,of} \,{\text{Indigenous}}=\frac{\mathrm{Habitat\,suitability\,type\,x }{\sum }_{i=0}^{n} \,polygon\,number}{{\sum }_{i=0}^{n} \,Area\,(sq. km)} x 100$$

Habitat suitability type scoring factors = optimum = 1; Moderate = 0.75; Marginal = 0.50 and Low = 0.25. \({\sum }_{i=0}^{n}\,polygon \,number\) = sum of polygon numbers pertains to a specific suitability class and \({\sum }_{i=0}^{n} \,Area (sq. km)\), represent the total area (sq. km) occupied by the species under the specific suitability class. High index value represent higher probability to restriction within the confined zone and lesser value represents the species generality.

2.9.2 Ellipsoid niche hypervolume

In order to augment the accuracy of species localization, machine learning models proffer an array of consequential variables. The hypervolumes associated with the niches of this species were quantified through the utilization of the top three predictors across all bioclimatic scenarios and RCPs. For this study, we employed NicheToolBox: Ntbox [60]. Ntbox, a GUI tool written in the R programming language, calls for the raster output of BC variables. By calculating the species' environmental values' centroid and covariance matrix, ellipsoidal models were constructed. It goes from the geographical centre outward to all possible settings in the research region. By employing this methodology, we can ascertain the environmental variables that govern both the fundamental and realized niche of this species.

3 Results

3.1 Data processing and multicollinearity

Through a comprehensive analysis of diverse sources worldwide, we have successfully extracted 334 sites of this species. By employing the spatial thin window feature in Wallace Software's [33] we were able to eliminate all but one instance of a given record within a specific region, utilizing a thinning distance of 10 kms. Ultimately, the ENM development process was completed by incorporating precisely 225 presence points of I. oblongifolia that were free of any spatial autocorrelation. These widely scattered coordinates were fed into the ConR software to calculate the present Extent of Occurrence (EOO) and Area of Occupancy (AOO). Figure 2 depicts the exercise's outcomes: an EOO of 50,640,914 km2 and an AOO of 900.0 km2, with the former indicating a status of least concern and the latter a vulnerable status. The final studied bioclimatic variables and their corresponding VIP values are listed in Table 1. Similarly, the "x" symbol denotes the removed variables in Table 1. Considering their high levels of correlation with other bio-climatic factors, BC-2 and BC-19 were removed from all analyses.

Fig. 2
figure 2

Graphical display of spatially thinned 225 presence point of Indigofera oblongifoliaconvey Extent of Occurrence (EOO) and Area of Occupancy (AOO)

Table 1 Variable Importance Percentage (VIP) of different bio-climatic variables during current, 2050 and 2070 timeframe with their respective RCPs

3.2 Model performances

Excellent ensemble model qualities were recorded as AUC approached 0.90 based on current climate and RCPs corresponding to timeframes of 2050 and 2070 (Table 2). Among the individual algorithm, the Random Forest has been identified as the optimal choice among other algorithms, owing to its superior performance in terms of AUC values across all climatic predictors (Tables S2 and S3). It yielded Kappa and TSS values greater than 0.82, indicating high model quality. On the other hand, low Kappa values obtained using the Maxent tool with all climate predictors show that this widely used ENM technique is ineffective for this species. With global human modification of Terrestrial Ecosystem (GHMTS), N and P fertilizer predictors, the ensemble model has good model quality, with AUC values less than 0.80. Random forest and Maxent had the highest and lowest Kappa values for the same predictor, respectively (Table S4). Livestock population densities (cattle, goat, sheep) had higher AUC values (> 0.80) with both the individual algorithm and the ensemble model.

Table 2 Ensemble model qualities parameters with both bioclimatic as well as non-bioclimatic predictors

3.3 Variable importance percentage

Among the bioclimatic and non-bioclimatic predictors examined, we used the Variable Importance Percentage (VIP) of each analysis to identify which variable affects the distribution pattern of this species the most (Table 1). Our analysis suggested that Precipitation Seasonality (BC-15) was the most influential variables with current, 2050 RCP 8.5, 2070 RCPs 2.6, 4.5 and 8.5. Mean Temperature of Wettest Quarter (BC-8) was identified as second influential variables with 2070 RCP 2.6 and 4.5. Precipitation of Wettest Quarter (BC-16), Maximum Temperature of Warmest Month (BC-5), Mean Temperature of Driest Quarter (BC-9) and Mean Temperature of Warmest Quarter (BC-10) were identified as important variables for distribution of this species during 2050 RCP -2.6, 2050—RCP 4.5, 2050 -RCP 6.0 and 2070-RCP 6.0, respectively (Table 1). Among the human related factors, GHMTS showed the largest impact on the distribution of this species with VIP value of 60.84, followed by nitrogen fertilizer (VIP = 23.64) and Phosphorus fertilizer with VIP value of 15.51. Similarly, among the livestock population, our analysis revealed greatest impact of goat density (VIP = 64.06) on the distribution of this species followed by sheep and cattle densities with VIP values of 20.17 and 17.75, respectively.

Observable trends in the predicted value of this species with respect to bioclimatic variables are depicted by curves of response in Figs. 3 and 4. We mapped out response curves using the two most significant predictors for each climate epoch and their RCPs. Precipitation Seasonality is defined as the coefficient of variation (the ratio of monthly total precipitation standard deviation to monthly total precipitation means). Based on our research, we know that this species' habitat suitability is generally in sync with this bio-climatic variables up to a factor of 150 for the current (Fig. 3a), 2050-RCP 8.5 (Fig. 3i), and 2070-RCPs 2.6 (Fig. 4a), 4.5 (Fig. 4c), and 8.5 (Fig. 4g). Further, habitat suitability of this species also exhibited synchronization with Precipitation of Wettest Quarter (BC-16) during 2050-RCP 2.6 wherein we found 500 mm is the optimum for this species (Fig. 3c). Maximum Temperature of Warmest Month (300C) found most suitable with 2050 RCP 4.5 (Fig. 3e). Similarly, Mean Temperature of Driest Quarter between 25 to 300C supported this species with 2050 RCP 6.0 (Fig. 3g). Mean Temperature of Warmest Quarter (BC-10) particularly 340C is found most suitable with 2070 RCP 6.0 (Fig. 4e).

Fig. 3
figure 3

Response curves showing dependence of habitat suitability on the predictors related to current and 2050 bio-climatic timeframes and their RCPs

Fig. 4
figure 4

Response curves showing dependence of habitat suitability on the predictors related to 2070 bio-climatic timeframes and their RCPs

3.4 Area (Sq. km × 103) under different suitability classes and their percent of total geographical land area

3.4.1 Global scenario

Spatial extents of different habitat suitability classes identified for I. oblongifolia with help of ensemble modelling are depicted in Fig. 5 (current bio-climatic predictors and non-climatic variables like livestock population and GHMTS as well as nitrogen and phosphorus fertilizers), Fig. 6 (2050 bio-climatic predictors with RCPs 2.6, 4.5, 6.0 and 8.5) and Fig. 7 (2070 bio-climatic predictors (with RCPs 2.6, 4.5, 6.0 and 8.5). Based on cell values in raster out put processed with ArcMap, we can categorize four types of suitability classes with 0.20-point break and these were designated as optimum, moderate, marginal and low [20]. Pictorial representation revealed this species is currently found in Australia, Southern Asia (India and Pakistan), the Arabian Peninsula (including Oman, Muscat, the Area of Qatar, and Yemen), and African countries such as Somalia, Sudan, Chad, Niger, Mali, and Senegal (Fig. 5A). This species has been found in Australia at Pardo, Karratha, Port Hedland, Broome, Derby, St. George ranges, and Muller ranges (Fig. 6A, B).

Fig. 5
figure 5

Habitat suitability of Indigofera oblongifoliawith current bio-climatic predictors (A) and non-climatic variables like livestock population (B) and human modification of terrestrial ecosystem (GHMTE) as well as nitrogen and phosphorus fertilizers (C)

Fig. 6
figure 6

Habitat suitability of Indigofera oblongifoliawith 2050 bio-climatic predictors (A) RCP 2.6; (B) RCP 4.5; (C) RCP 6.0 and (D) RCP 8.5

Fig. 7
figure 7

Habitat suitability of Indigofera oblongifolia with 2070 bio-climatic predictors (A) RCP 2.6; (B) RCP 4.5; (C) RCP 6.0 and (D) RCP 8.5

The Indian states of Gujarat, Tamil Nadu, and Andhra Pradesh also saw a reduction in area as a result of this climatic projection. In the Arabian Peninsula, only Oman had a higher concentration. While in Africa, it was missing from Somalia and had shrunk in Sudan, Chad, Niger, Mali, and Senegal (Fig. 7B and C). However, with 2050 RCP 8.5, its optimum suitability areas were drastically reduced from Australia, the Arabian Peninsula, and southern India (Andhra Pradesh and Tamil Nadu). Its primary territory fell under this category in the Indian states of Rajasthan and Gujarat, as well as the Pakistan cities like Hyderabad and Karachi. This species' area has shrunk across Africa, with Sudan, Chad, Niger, Mali, and Senegal reporting shrinkage (Fig. 7D). We observed more shrinkage for this species in the Australian region with 2070 RCP 2.6 (Fig. 7A) compared to 2050. In the climatic timeframe 2070 with RCPs 4.5, 6.0, and 8.5, we observed the complete absence of this species from the Arabian region, as well as a significant shrinkage in width in the African region (primarily Sudan). Reduction from southern India and a small patch of Australia (Fig. 7B, C and D).

Global calculated area (Sq. km × 103) for each suitability class with the current, 2050, 2070 climate and RCPs scenarios as well as studied non-bioclimatic predictors are depicted in Table 3. Under the optimum are we recorded highest area (10,803.64) with GHMTS as well as nitrogen and phosphorus fertilizers), while among the bio-climatic variables, the higher (7638.67) areas under this class were recorded with 2050 RCP 2.6, and the lowest (2910.09) area among all the predictors was recorded with 2070 RCP 6.0.

Table 3 Global Area (Sq.km × 103) of different suitability class and their percent of total geographical land area (PTGLA) of the world (510,072,000 sq. km.) Area under different habitat suitability classes of I. oblongifolia with current bio-climatic conditions, projected 2050 and 2070 bio-climatic prediction with different RCPs and non-bioclimatic variables (GHMTE, N and P and Livestock population)

At a global level and particularly area under optimum suitability class, our data revealed increasing trends during 2050 RCP 2.6 and 4.5 as compared to its current area. However, with other remaining climatic variables and RCPs, we observed declining trends (Table 3). With reference to global total geographical land area (PTGLA) this species has only 2.12, 1.46 and 1.50 percent area under the optimum category quantified GHMTS, N and P, livestock and current climatic conditions, respectively.

Our bioclimatic predictors show that this species occurs in great abundance on both the African continent and in India; therefore, in order to put a numerical value on its indigenous status, we have conducted a thorough examination of its distribution across both landmasses in relation to various climate scenarios.

3.4.2 Continent scenario (Africa)

Results related to the distribution patterns of this species across the African continent and the area covered by it in various habitat suitability classes under different climatic and RCPs predictors are presented in Fig. 8 and Table 4, respectively. Our research indicated that this species likely exists in the Sudan, Chad, Niger, Mali, and Mauritania, all of which contain suitable habitats ranging from highly suitable to moderate for its survival. However, under RCP 6.0 by the year 2070, we found that countries like Zambia, Angola, Gabon, and the Democratic Republic of the Congo no longer had any indications of this species at all (Fig. 8). In 2070 RCP 8.5, some new areas of moderate suitability appeared in Botswana, Zimbabwe, and Namibia.

Fig. 8
figure 8

Habitat suitability of Indigofera oblongifolia at Africa continent with 2050 and 2070 bio-climatic predictors and their respective RCPs 2.6, 4.5, 6.0 and 8.5

Table 4 Africa Area (Sq.km × 103) of different suitability class and their percent of total geographical land area (PTGLA) of the world (510,072,000 sq. km.). Area under different habitat suitability classes of I. oblongifolia with current bio-climatic conditions, projected 2050 and 2070 bio-climatic prediction with different RCPs and non-bioclimatic variables (GHMTE, N and P and Livestock population)

Maximum area (5050.67 Sq. km × 103) under the optimum category was recorded with 2050 RCP 2.6 which was 16.63 per cent of total geographical land area of the Africa (Table 4), similarly the lowest area (2379.32 Sq. km × 103) under this class was recorded with 2070 RCP 6.0 which is only 7.84% of the total land area of Africa. We also found that, relative to current conditions, areas falling under this category were smaller for all future climate and non-climate scenarios except for the 2050 RCP 2.6. Both non-climatic variables showed more or less similar areas in this class (Table 4), ranging from 11.48 to 11.68 percent of Africa's total land area. However, maximum area (7660.19 Sq. km × 103) under the moderate class was observed with livestock component, followed by current (7246.6 Sq. km × 103) and 2050 RCP 2.6 (7038.84 Sq. km × 103).

3.4.3 Country level (India)

Results related to the distribution patterns of this species across the India and the area covered by it in various habitat suitability classes under different climatic and RCPs predictors are presented in Fig. 9 and Table 5, respectively. Within India, our analysis revealed clear zonation of different habitat suitability classes; the optimum places for this species were consistently in the west, including the states of Rajasthan and Gujarat; however, under the RCPs of 2.6, 4.5, and 6.0 of 2050, there are also suitable locations in the south, in states like Tamil Nadu and Andhra Pradesh. These southern regions have become moderately suitable with remaining climatic time frame and RCPs. The central, southwestern, and eastern regions of India provide moderate habitat for this species under RCP 2.6 and 4.5 in 2050, but under all other RCPs, these regions will fragment into smaller and smaller patches (Fig. 9). The eastern part of the country has been labelled as a low suitable area, while the northern part of the country is completely devoid of this species. Two very different scenarios for this species were displayed within India in relation to GHMTS, Nitrogen, and Phosphorus fertilizers, and livestock population densities. When we looked at the former non-bioclimatic variables, we found that there were intermingled classes of habitat suitability (Fig. 10). When travelling across India with livestock, we regularly saw large regions that fell into either the optimum (Western and Southern regions) or the moderate (Central, Eastern, and Western regions) categories. As obvious, we recorded the higher areas under the optimum class with non-bioclimatic variables, which were roughly covering 24 to 28% of India total land area. Among the climatic time-frame and RCPs, highest (578.79 Sq. 3 km × 103; 17.60%) and lowest (240.39 Sq. km × 103; 7.61%) area under this class were recorded with 2050 RCP 4.5 and 2070 RCP 2.6, respectively. Moderate areas for this species covering 24.38% (2070 RCP 4.5) to 58.15 (2050 RCP 4.5).

Fig. 9
figure 9

Habitat suitability of Indigofera oblongifolia at India with 2050 and 2070 bio-climatic predictors and their respective RCPs 2.6, 4.5, 6.0 and 8.5

Table 5 India Area (Sq.km × 103) of different suitability class and their percent of total geographical land area (PTGLA) of the world (510,072,000 sq. km.). Area under different habitat suitability classes of I. oblongifolia with current bio-climatic conditions, projected 2050 and 2070 bio-climatic prediction with different RCPs and non-bioclimatic variables (GHMTE, N and P and Livestock population)
Fig. 10
figure 10

Habitat suitability of Indigofera oblongifolia at India and Africa with non-bioclimatic variables GHMTES, N and P and Livestock populations

3.4.4 Percent Indigeneity of I. oblongifolia

We calculated the percentage of Indigenous for this species using a weighting pattern of habitat types, the number of polygons (the number of sites occupied by this species), and the total occupied area under the specific suitability class, and the results are shown in Fig. 11 (Africa) and Fig. 12 (Indian region). This is a new index proposed first time and our results indicates that indigenous nature of this species largely affected with number of polygons i.e., higher the occurrence probability at different patches.

Fig. 11
figure 11

Per cent Indigeneity of Indigofera oblongifolia at Africa continent with different habitat type under different climatic and non-climatic projections

Fig. 12
figure 12

Per cent Indigeneity of Indigofera oblongifolia at India continent with different habitat type under different climatic and non-climatic projections

Livestock population was the strongest predictor of optimal indigenous production in Africa, followed by GHMTS, nitrogen, and phosphorus. The highest indigenous value (2.06) was recorded with 2050 RCP 6.0 among climatic and RCPs, and the lowest (0.32) was recorded with 2070 RCP 8.5. In comparison to the current bio-climatic predictor, our results show an increase in indigenous nature under this class with 2050 and 2070 and their RCP projections. In comparison to optimum habitat, moderate habitat has more climatic indigenous values. We recorded generality patterns for the low suitability class because we had lower values of this index. The indigenous score in the optimum class was higher in India than in the African region. The highest value (7.90) was recorded with livestock, followed by 2050 RCP 6.0 with a value of 7.0. (Fig. 12).

3.4.5 Ellipsoid niche hypervolume

We developed ellipsoid hypervolume (multidimensional space of resources available for a species) to simulate the species' fundamental niche (defined as a species' ability to persist and reproduce in a broader range of environments when not competing with other species) and its realized niche (when it is in the presence of other interacting species) using occurrence records of I. oblongifolia and the most important environmental variables identified from ensemble modelling in form of raster output. This allows us to identify the bio-climatic factors that governs its fundamental and realized niche. The results are display in Fig. 13 (current and 2050 and its four RCPs) and Fig. 14 (2070 and its four RCPs). In both, The blue colour represents niche stability, the green colour represents niche unfilling (the proportion of the native niche that does not overlap with the exotic niche), and the red colour represents niche expansion (Ahmed et al. 2019). The size of these zones corresponds to the volume of their niche.

Fig. 13
figure 13

Graphical representation of Indigofera oblongifolia niche hypervolume with three most influential bioclimatic variables pertains to current and 2050 (2.6, 4.5, 6.0 and 8.5 RCP) bioclimatic time-frame. The blue colour represents niche stability, the green colour represents niche unfilling (the proportion of the native niche that does not overlap with the exotic niche), and the red colour represents niche expansion

Fig. 14
figure 14

Graphical representation of Indigofera oblongifolia niche hypervolume with three most influential bioclimatic variables pertains to current and 2070 (2.6, 4.5, 6.0 and 8.5 RCP) bioclimatic time-frame. The blue colour represents niche stability, the green colour represents niche unfilling (the proportion of the native niche that does not overlap with the exotic niche), and the red colour represents niche expansion

In terms of bioclimatic space, I. oblongifolia ellipsoidal niche had a larger hypervolume (52.81 × 104 °C・mm2) during 2050 RCP 2.6, followed by current (18.44 × 103 °C.mm2), and was the smallest during the 2050 RCP 6.0 (23.74 × 102 °C.mm2). However, among the RCPs of 2070 projection, highest hypervolume (13.32 × 103 °C.mm2) was quantified with RCP 6.0, followed by RCP 2.6 (11.31 × 103 °C.mm2) and was least (41.89 × 102 °C.mm2) with RCP 8.5 The influence of environmental variables on niche dynamics is indicated by their centroid values. Their proximity to the centroid indicates that they exert control over species suitability [61].

Values of centroid as well as range of different bio-climatic variables pertain to various time-frames and RCPs are presented in Table 6. With current BC, this species demonstrated the greatest niche expansion from its fundamental niche with annual precipitation (BC-12, i.e., water variable), while precipitation seasonality (BC-15, water availability), and maximum temperature of the warmest month (BC-5) was identified as a facilitator to maintain its fundamental niche areas. Further our data revealed that BC-5 and mean temperature of the wettest quarter was identified for its niche expansion with 2050 RCP 4.5 and 6.0 and 2070 RCPs 2.6, 4.5 and 8.5. While mean temperature of the warmest quarter (BC-10) was identified for niche expansion with RCP 8.5 and RCP 6.0 and 2050 and 2070, respectively. Precipitation seasonality was also identified as a major facilitator to maintain the fundamental niche of this species with all RCPs of 2070 and 8.5 of 2050 (Table 6). Temperature seasonality was the most important factor for niche expansion during 2050 RCP 2.6 and the precipitation of the driest month (BC-14) was identified as the most important governing factor to maintain its fundamental niche area with similar time period.

Table 6 Values of centroid (and range) of different bio-climatic variables pertains to various time-frames and RCPs

4 Discussion

The development of current habitat models and forecasting of the forthcoming distribution range of plant species are pivotal components in the creation of recuperation strategies for their indigenous habitats. To devise effective strategies for conservation and management, it is imperative that we acquire a greater understanding of the dispersal patterns of various species and the ramifications of climate change on their potential habitat suitability [62, 63]. The elucidation of the correlation between a species range and its susceptibility to extinction not only enhances our knowledge, but also furnishes us with the elemental parameters that demarcate the geographical expanse of said range[64, 65]. The ecological niches of numerous plant species have been modified due to the escalation of global temperatures in recent decades [66,67,68].

Regression and machine learning are the two main subfields of species distribution modeling (SDM) that use presence-absence data. Generalized linear models, generalized additive models, and multivariate adaptive regression splines (MARS) are all regression-based techniques. Artificial neural networks (ANNs), classification trees (CARTs), maximum entropy (MaxEnt) algorithms, genetic algorithms (GAs), and random forests (RFs) are all machine learning approaches. Our literature review indicated significant diversity in performance among the aforementioned methods, implying that predicting which sorts of model characteristics increase or do not improve model performance may be problematic. The optimum model will be chosen not just by how the model's assumptions relate to the assembly processes that shape a specific community, but also by other aspects such as the quantity, quality, and spatial organization of the data. Even if two data sets appear to be identical, they may be best modeled using distinct methodologies [11]. In a nutshell, the methodologies differ in terms of species data (absence/presence vs presence-only) and prediction parameters (mechanistic-physiological constraint vs empirical-climatic approach). The shortcomings of any model restrict its predictive power. Over-adjusted models are less likely to be useful when applied to new data [13]. Ensemble approaches, which are more accurate than individual methods, are one option. Thus, among the results of several algorithms, the best-fitting prediction of a species' future range may be used [36]. Ensemble modeling is provided as a method for improving model predictions and determining how comparable several model outcomes are. The essential premise of ensemble (or sometimes consensus) modeling is that different modeling outputs can be viewed as alternative states of the underlying distribution. This method combines projections from multiple models into a single, unified surface for averaging. Several plant species have been studied, including Abies pindrow, Adenocarpus mannii, Angylocalyx oligophyllus, Baphia maxima, Berlinia grandiflora, Betula utilis, Bradypus variegatus, Cassia mannii, Cedrus deodara, Cynometra hankei, Dialium tessmannii, Oxybaphus himalaicus, Picea smithiana, Pinus wallichiana, Pseudovigna sulaensis, Psophocarpus scandens, Quercus ilex, R. delavayi, Rhododendron arboretum, Sclerocarya birrea, Thryothorus ludovicianus, Vigna comosa, have demonstrated the predictive accuracy of ensemble techniques for ENM modeling [20].

The current study employed an ensemble machine learning approach to acquire knowledge regarding the distributional tendencies of I. oblongifolia, a leguminous species that has received comparatively less scholarly attention. Additionally, we sought to identify the determinants that exert an influence on the aforementioned tendencies across both temporal and spatial dimensions. In the past, there have been endeavours to conduct ecological niche modelling for following leguminous species: Amorpha canescens Pursh) Dalea candida Michx [69], Cynometra hankei Harms, Pseudovigna sulaensis R. Clark & Burgt, Adenocarpus mannii Hook fil., and Berlinia grandiflora (Vahl) Hutch. & Datziel [18]. In contrast to other studies pertaining to leguminous flora, our study distinguishes itself on account of its comprehensive scope, which encompasses a worldwide perspective. Furthermore, we have taken into account a variety of additional variables, such as anthropogenic activities, as well as the application of nitrogen and phosphorus fertilizers, and the density of livestock populations.

The findings of this study indicate that climatic predictors exhibit greater efficacy than non-climatic predictors in enhancing the quality of models. Moreover, the ecological niches of this particular species are subject to the influence of energy-related parameters such as the maximum temperature attained during the warmest month, the mean temperature during the driest and warmest quarters, and hydrological variables such as the periodicity of precipitation and the amount of precipitation during the wettest quarter. Use of AUC, TSS and Kappa values for ensemble model performance specifically in SDM of plant species have been advocated in many previous studies [12, 59, 70, 71]. The AUC is extensively used to evaluate the accuracy of habitat suitability models, while TSS normalizes total accuracy [12, 72].

Our subsequent efforts involved an attempt to assess its inherent characteristics over temporal dimensions, encompassing a range of bioclimatic factors, and spatial dimensions, utilizing diverse suitability classifications, specifically pertaining to the African continent and the nation of India. In addition to the established criteria for weighting diverse suitability classes, the utilization of area as a denominator and number of polygons as a multiplicator in the index provides insight into the higher indigenous score observed in India as compared to the African region within the optimum class. The empirical evidence indicates that this particular species tends to inhabit more contiguous and expansive regions throughout the African continent, whereas in India, it appears to be dispersed into several smaller meta-populations, which are defined as groups of geographically isolated populations of the same species. The aforementioned findings can be comprehended through the lens of the population viability analysis theory, which posits that a densely populated group is comparatively more vulnerable to alterations in their surroundings than a scattered one. Our results offer valuable insights into the criterion for this particular species. We observed that the African optimal area, under varying RCPs and climatic advancements, exhibited a smaller area in comparison to the current optimal area (Table 4), with the exception of 2050 RCP 2.6. This trend was also observed in moderate areas. However, the converse was true for marginal habitats, suggesting a transformation of high-quality habitats into low-quality ones. In the case of India (Table 5), a reversal of the aforementioned trend was observed, wherein the area under the optimum class exhibited an increase at RCP 2.6 and 4.5 in 2050, and RCP 2.6 and 8.5 in 2070.

Contrary to popular belief, the geographic distributions of species are not immutable, but rather fluid, and can undergo changes over time due to environmental fluctuations [73, 74]. The incorporation of the temporal dimension is a crucial aspect in the precise modelling of niches, as evidenced by the findings of Feng et al. [75]. Furthermore, it bears significant implications for the reproducibility of results. In the course of our research, we have expounded upon the ecological niches that have been discerned for I. oblongifolia, taking into account its evolutionary trajectory across various climatic time periods and its spatial distribution, with particular emphasis on its global propagation and subsequent concentration at the continental and national scales.

The current study entailed a meticulous examination of the threshold limits of diverse bio-climatic predictors, spanning various time frames and emission scenarios, with respect to the distribution patterns of this species. This was accomplished by means of algorithm-based modelling, specifically ROC curves, as well as by invoking the niche-centroid hypothesis, which posits that the optimal conditions of an ecological fundamental niche are conducive to heightened reproduction and survival rates among individuals. In essence, there exists an inverse correlation between the abundances of various species and their respective distances from the fundamental niche centroid [76, 77]. Thus, in our subsequent methodology, we discerned the efficacious predictors for upholding the fundamental and realized niche of species. An elevated centroid value is indicative of niche expansion, or the realization of a species' potential niche, coupled with a lower abundance. Conversely, a reduced centroid value suggests that a predictor is facilitating the maintenance of its fundamental niche areas, resulting in a higher species abundance [11, 78].

Based on our research, we know that Precipitation Seasonality (BC-15) is the most important factors that affect the optimum habitat suitability of this up to 150 for the current (Fig. 3a), 2050-RCP 8.5 (Fig. 3i), and 2070-RCPs 2.6 (Fig. 4a), 4.5 (Fig. 4c), and 8.5 (Fig. 4g). While, multidimensional space of resources availability (niche hypervolume) for this species indicates that with advancement of climatic-timeframe and RCPs, this parameter, shifted their range from 79–176.9 to 85–196 and hence it may be linked with changes in area under the optimum suitability class with above temporal and emission events. Our ellipsoid niche modelling further elaborates the range of these bioclimatic variables which were up to 637 mm and 26.5–31.8  C, respectively. Based on such observation, we can link the increase in area under optimum category 7638.67 Sq. km × 103 and 6443.37 Sq. km × 103 under these two RCPs of 2050 as compared to current optimum area 5859.86 Sq. km × 103. Similarly, mean temperature of driest quarter (BC-9) and mean temperature of warmest quarter (BC-10) were found to be most influential during RCP 6.0 of 2050 (with ROC 25 to 30  C) and 2070 (with ROC 34  C).

The livestock populations present in the habitats of I. oblongifolia exhibit a positive correlation with a greater area under the optimum class in comparison to various bioclimatic and RCPs. This phenomenon can be attributed to the fact that the frequency of livestock trampling leads to an increase in soil bulk densities, thereby promoting soil aeration and ultimately resulting in a higher probability of germination. This finding aligns with the research conducted by Mathur and Sundarmoorthy [31]. Further, I. oblongifolia, is propagated from seed. Flowering and fruiting occur from late September to roughly the middle of March [23] Fruits that were mature in April had a better germination rate than those that were immature in June. When heated to 35  C, 82% of newly harvested seeds germinated, while only 15% of seeds stored for two years did so. Seed viability is affected by a number of variables, such as the success rate of germination and the length of time the seeds are kept in storage.

Further, highest area optimum suitability class was recorded with Global Human Modification of Terrestrial Ecosystem (GHMTE) and with Nitrogen and phosphorus fertilizer application rates (in kilogrammes per hectare). Such results indicate the facilitation behaviour these nutrients on the population dynamics of this species. However, the positive impact of GHMTE on the area under the optimum habitat class can be explain with the Intermediate Disturbance Hypothesis (IDH) and Dynamic Equilibrium Model (DEM). The IDH predicts large species numbers at intermediate levels of disturbance and the DEM predicts that the effect of disturbance depends on the level of productivity [79]. In conclusion, these classical ecological approaches suggested that intermediate disturbance frequency supports higher species population due to a mix of good colonizer and good competitor species. Population fluctuations in I. oblongifolia may be less sensitive to human developmental activities because it is typically found in sandy, arid, and semi-arid regions of the world, where biotic activities and other factors associated with habitat destruction are less common than in other, more densely populated and resource-rich regions.

5 Conclusion

The success of conservation and management efforts depends on researchers' ability to track species' movements and predict how climate change and other predictors will affect the suitability of existing habitats. Our analysis realized the importance of species-specific knowledge of eco-physiological processes, such as light energy budgeting and the relationship between moisture and plant performance, for accurate species distribution modelling. Arid and semiarid species can benefit greatly from this physiological essence because of the pulsing nature of the resource release in these environments. Such pulse (rain), inter-pulse (cold period), and non-pulse (summer) events often triggered significant eco-physiological changes within species and could be a deciding factor for the role of species within the plant community. Many questions remain unanswered in the regarding I oblongifolia. Despite this, the vast majority of research pertain to the pharmacological potential of this species. That means we can learn a lot about the ecology and physiology of this species (competition and interaction with other community associates). Using a global perspective, we are able to better understand the species’ distribution, and we made an effort to draw a link between the species' likely reactions to both current and future predictors.