Introduction

The extinction risk of a species is linked, among other factors, to its limited plasticity and capacity to adapt to rapid changes in environmental factors (e.g., temperature or precipitation; Walther et al. 2002). Slow adaptation to change (i.e., physiological and genetic responses) could be yet another limitation of species included in some risk categories (Bradshaw 1965; IUCN 2017), because some environmental variables fluctuate faster than the generation period of many species (Clements et al. 2004), which could result in a late genetic response and restricted geographic distribution.

Identifying variables that limit or enhance the presence or abundance of species with high conservation or economic values is a valuable step in characterizing their habitat (Antúnez et al. 2017b), given the complex task of elucidating the behavior of each species in a multidimensional space (Hutchinson 1957; Zhu et al. 2021). Such information is essential for adequate monitoring of natural populations of high ecological, social, economic or any other value (NOM-059 2010), for designing and promoting rational use and conservation actions, such as assisted migration in the face of progressive changes in the environment (Rice and Emery 2003; Sáenz-Romero et al. 2012; Gómez-Ruiz et al. 2020).

At present, information concerning sensitivity of priority species for conservation (endemic, danger of extinction or rare) to the individual fluctuations of some covariates is insufficient. For example, it is not known which of the species listed in the Official Mexican Standard for native species in a given risk category (NOM-059 2010), or in the IUCN Red List, are the most vulnerable to changes in the precipitation regime of any specific period of the year (e.g., April–September, the growing season precipitation), or to the reduction of the global average precipitation, which seems to be up to 3% in subtropical land areas (Watson and Albritton 2001). Nor has the response of these species to the change of a temperature variable been described (e.g., the minimum or maximum temperature of the warmest month).

A methodological approach which reinforces the biogeographic knowledge of plants, including the fundamental ecological niche, is the correlational study between bioclimatic variates and an abundance indicator (Soberón et al. 2017). In this context, studies related to species distribution and abundance are often approached mainly from two perspectives: (i) physical-geographical space and (ii) non-physical space (Hutchinson 1957; Colwell and Rangel 2009), which correspond to the “area of distribution” and “ecological niche” in species distribution models (SDM) (Soberón et al. 2017).

In this study, we analyzed 25 species of trees and shrubs of high conservation priority in Mexico (endemic, rare, endangered or subject to special protection) and a total of 25 climatic and physiographic variables. The relative abundance relationships of each species and environmental variables were examined by identifying different statistical association indicators. To this, several analytical techniques were used, including covariation analysis by permutations, Boruta wrapper algorithm, multivariate analysis and regression analysis. The selected techniques provide quantitative indicators of the possible degree of individual incidence of the variables on the relative abundance of the species from different perspectives, including the strength and direction (positive or negative) of the relationship between the variables studied and an index or importance value of each predictor.

Materials and methods

Sampling area and data collection

We utilized data from 3610 sampling units (clusters), which were collected by the National Forestry Commission (CONAFOR) during the National Inventory of Forests and Soils (INFyS) for the 2004 − 2009 period. These sampling units were stratified and systematically distributed, trying to cover the most important wooded areas of Mexico. Each main sampling unit (cluster) consisted of four subunits of 400 m2 each; the clusters were equidistantly distributed every 5 km in temperate zones and tropical and subtropical regions, every 10 km in semi-arid communities and every 20 km in dry regions (CONAFOR 2009). Sampling and data record details on the main units and subunits can be found in the National Forest Inventory manual (CONAFOR 2009) (Fig. 1).

Fig. 1
figure 1

Network of sampling units, where at least one of the 25 species studied were observed. This map is an adaptation of the potential vegetation map for Mexico by Rzedowski (1990), published by the National Commission for the Knowledge and Use of Biodiversity (CONABIO 2001)

Species and variables studied

A total of 25 species of high conservation value and classified in some risk category by the Official Mexican Standard (NOM-059-SEMARNAT) were selected (NOM-059 2010). Seventeen of the studied species (68%) appear also in one of the categories of the red list of the IUCN (2017) (Table 1).

Table 1 Species studied, their classification categories according to the NOM-059 and the IUCN list, and vegetation type based on the climatic region

Twenty-five explanatory variables were modeled, including maximum, minimum and average temperatures; precipitation at specific periods; freezing dates; and some physiographic variables (Table 2). We obtained the climatic variables for each sampling plot from the USDA Forest Service Rocky Mountain Research Station Modeler (Rehfeldt 2006; Rehfeldt et al. 2006), which uses climate data for a 30-year period (1961−1990), with records of over 6000 weather stations in Mexico, South America, Guatemala, Belize and Cuba (Rehfeldt 2006; Crookston et al. 2008; Sáenz-Romero et al. 2010). The average slope of the terrain, the geographic aspect and elevation, important factors in the species' distribution (Kebede et al. 2013), were registered in the field (CONAFOR 2009) (Table 2). To integrate geographic exposure into quantitative analyses, each geographic orientation of the main sampling plot was coded with numbers from one to nine: zenith = 1, north = 2, south = 3, east = 4, west = 5, northeast = 6, southeast = 7, northwest = 8 and southwest = 9. Numerical coding was done during the gathering of field information (CONAFOR 2009).

Table 2 Variables used as predictors in the analyses performed for this study

In this study, the relative population density per plot was used as an indicator of the abundance, whose value was obtained by dividing the number of individuals recorded in each plot of a given species by the total number of individuals of that species in all plots (Brower et al. 1998).

Data analyses

Pearson, Kendall and Spearman correlations analyses; principal component analysis (PCA); linear regression analyses (LRA); regression trees by random forest (RA-FOR); Boruta wrapper algorithm (BA); and a measure of covariation (C) were used as described by Gregorius et al. (2007) and Gillet and Gregorius (2008). In this last method, with 10,000 permutations, the abundance of each species was correlated with each of the variables, assuming the rest remained constant. The importance of the selected methods lies in (1) providing quantitative indicators on the strength and direction (positive or negative) of the relationships between the variables studied; (2) identifying those that explain the highest percentage of variability in the data of interest; and (3) providing values or indices of importance based on learning algorithm.

According to Kursa and Rudnicki (2010), the Boruta algorithm iteratively compares the importance of certain attributes with the importance of shadow attributes, which are created by shuffling the original ones. With the PCA, we identified the variables that explained the highest percentage of variability in the data (Dormann et al. 2007). We used a regression analysis to identify evidence of a causal relationship, but the results were not robust, and therefore, are not reported here. Likewise, the correlation coefficients of Pearson, Kendall and Spearman were excluded because they were not significant. All analyses were done using R ver. 3.4.0 (R Core Team 2017).

Selection in RA-FOR was based on values from a statistical dispersion measure called the Gini coefficient or Gini index (Cutler et al. 2007). This index is effective and simple to estimate, and its value not only focuses on the heterogeneity reduction produced by a given variable, but also makes a correction to the bias produced in the model (Sandri and Zuccolott 2008). High Gini index values indicate a greater contribution of a given independent variable to explain the variation of the dependent variable (Cutler et al. 2007; Sandri and Zuccolotto 2008; Breiman and Cutler 2017). In the Boruta algorithm, the variables were filtered by the variable importance measure (VIM) (Kursa and Rudnicki 2010).

For evaluating if the high importance variables are the same in different types of vegetation, they were categorized into four groups: (1) Variables with strong evidence; those occupying the first five places when sorted in descending order of importance in four analyses techniques. (2) Variables with medium evidence; those that, sorted in descending order, occupied the first five places according to the order of importance in three analyses techniques. (3) Variables with poor evidence; those occupying the first five places when sorting them in descending order in one or two techniques. (4) Variables with very poor evidence; those that did not appear among the first five places in order of importance in any of the techniques tested or that were not statistically significant.

Results

What variables showed the highest importance indexes?

The order of importance of the variables differs for each species and vegetation type. For example, according to the random forest method ELEV and ASL were detected as high-relevant variables for Cryosophila argentea (a tropical species), Olneya tesota (a dry region plant) and for Pseudotsuga menziesii (a temperate species) (Fig. 2; Table S1). In contrast, variables showing weak evidence of relationship to the relative abundance of these species were: MMINDD0, D100 and DD0 (Table S1). When removing ELEV from the analyses, assuming that it could dilute the contribution of other variables, very few changes in the order of importance were observed in most cases (Table S2). Most of the replacements or alterations were observed in the results of the random forest method. Even so, 76% of the species retained at least three of the first five variables that headed the order of importance (Tables S1 and S2).

Fig. 2
figure 2

The five variables with the highest values of importance according to random forest method for Cryosophila argentea, Olneya tesota and Pseudotsuga menziesii. The lines at the top of the bars are standard errors. ELEV is elevation above sea level; ASL, average slope of the terrain; GSDI, growing season dryness index; SDI, summer dryness index (GSDD5)0.5/GSP; MAP, mean annual precipitation; PRATIO, GSP/MAP; WINP, winter precipitation

What types of relationships were observed?

In most cases, poor linear relationships were observed among the variables and the relative abundance of the species, showing small coefficients or being not significant (P > 0.05). Regardless of the degree of correlation, most of the coefficients were positive. For example, the abundance of Carpinus caroliniana, Astronium graveolens and C. argentea were negatively correlated with GSP (Fig. 3a and b). However, in some species, such as Tilia americana (var. mexicana), P. menziesii or Thrinax radiata, a positive correlation was observed among the same variable and the relative abundance (Fig. 3a and b).

Fig. 3
figure 3

a Representation of the summer precipitation (SMRP), elevation above sea level (ELEV) and growing season precipitation (GSP) in the factorial plane for three species of temperate regions; b representation of the mean temperature in the coldest month (MTCM), growing season precipitation (GSP) and mean annual precipitation (MAP) in the factorial plane for three species of tropical regions, with their respective correlation coefficients based from the measure of covariation with 10,000 permutations

Was any pattern of response identified?

The individual effect of each variable changed in both intensity and magnitude, contrasting in many cases (Table S4), even when comparing among species of the same vegetation type, as in the case of C. caroliniana whose response was negative to variation in summer precipitation (SMRP), while P. menziesii showed a positive relation to SMRP (Fig. 3a; Table S4).

In a species-by-species assessment based on the PCA results, variables GSDD5 and DD5explained the highest percentage of variability in the data, followed by MAP, GSP and ELEV, which usually appear among the first five most important variables in the first dimension of the principal components, accounting for over 80% of the studied species (Fig. 4a and b; Table S3).

Fig. 4
figure 4

Two-dimensional PCA ordination of the first two principal components as a function of the explanatory variables: a results for Carpinus caroliniana, Pseudotsuga menziesii and Tilia americana, and b results for Astronium graveolens, Cryosophila argentea and Thrinax radiata. The ellipses show the covariance structure for each species, while the vectors show the projections of the original data

When valuing the results of all the analyses techniques applied in this work, excluding ELEV (an underlying climate factor), there is strong evidence that MAP, GSDI, SMRP, FFP, SPRP, WINP, GSP, GSDD5 and DD5 had influence more on the species that grow in the tropical and warm regions. The measure of heating degree-days above 5 °C, SMRP, GSP, MAP, MTCM, MMAX and SDAY showed medium evidence on the species belonging to the temperate forest, while MMIN and MMAX were the most relevant for species from dry regions, although for this latter case, the evidence was moderate (Tables S1, S2, S4).

Discussion

Response of the species to unitary change in variables

Considering the changing global climate, it is important to diagnose the spectrum of variables that will enhance or inhibit the decline in plant diversity, from a unitary and multivariate perspective. The results suggest that the variation of some variables can positively influence the abundance of a species and also affect other species of the same ecological affinity (Figs. 3 and 4a; Tables S1, S3 and S4). For example, with an increase in the precipitation from April to September (GSP), the most favorably affected tropical species, among those studied, would be Astronium graveolens, Avicennia germinans and Vatairea lundellii with absolute values of C equal to or greater than 0.94 (Table S4). On the contrary, the same variable has a poor correlation with the abundance of Guaiacum sanctum, Sideroxylon capiri or Olneya tesota (absolute values of C equal to or less than 0.22) (Table S4), which confirms that each climatic variable affects, in a different magnitude and intensity, the presence or abundance of a species (Thuiller et al. 2004; Toledo et al. 2012; Martínez-Antúnez et al. 2013).

Similarly, the correlation values of species abundances regarding the different temperature records (minimum, maximum and average) fluctuated, suggesting that, if a causal connection exists, it is irregular, heterogeneous without a defined single tendency (Table S4). For example, although most correlation coefficients were small, increasing MTCM would negatively affect A. graveolens, A. germinans or Bursera coyucensis (a negative and significant correlation was found), but would positively affect the relative abundance of V. lundellii, Lophocereus schottii or Cryosophila argentea (Table S4).

Role of variables from a multivariate perspective

When all the environmental variables were included in the analyses, we could not determine clearly whether the abundance increased or decreased as the value of a variable changed. Such was the case for the variables related to the amount of heat available (important for the growth and development of plants species) based on mean monthly temperature above 5 °C (GSDD5 and DD5), which appear in the first group of components of the PCA accounting for more than 95% of the species (Table S3), but their contributions in the random forest models and the correlation coefficients were not always significant (Tables S1, S4). Instead, winter precipitation (WINP), mean annual precipitation (MAP) and the physiographic variables, showed signs of a high causal relationship on at least 32% of the total species studied, according to the random forest.

Therefore, a single and categorical conclusion cannot be made from the multivariate analyses because their results do not always coincide. This result might be due to the fact that each method is based on different assumptions and that each method assesses a species’ response to variation in predictors from different perspectives. For example, PCA is designed to identify the variables that explain the highest percentage of variability of the data from a multivariate perspective excluding autocorrelation, and RA-FOR tries to detect the most relevant variables of a random vector whose selection is based on the degree of contribution of each variable to decrease the error, assuming that each one is independent of the other.

The negligible contribution of the values of the index of the amount of heat available calculated from the temperature above 5 °C in most of the random forest models (unlike PCA analysis), might suggest an indirect effect or a nonlinear relationship. In similar studies, using correlative models, DD5 has been reported as one of the most important climatic variables for several species of ecological or economic interest, such as Agave cupreata (Sáenz-Romero et al. 2010) or Pinus strobiformis, Pseudotsuga menziesii and Pinus arizonica (Martínez-Antúnez et al. 2013).

Other tools that may be applied are the regression techniques excluding the collinearity, spatial and temporal autocorrelation (Raxworthy et al. 2003; Martínez-Meyer and Peterson 2006), but it is very probable that each technique will continue to give different results. So, the methods where a single environmental variable is included (under a scenario of absence of spatial and temporal autocorrelation) could be more useful, in certain cases, to identify the degree of association or correspondence between two variables. Table S4 shows the variables that have high correlation (positive or negative) with each of the species. Certainly, combination of methods (univariate and multivariate) might give a broader and clearer panorama on the relationship among each of the most relevant environmental variables and the abundance of species (Ter Braak 1986; Thuiller et al. 2004; Martínez-Antúnez et al. 2015).

Considerations and possible applications

In this study, we tried to reduce the probability of a variable being misclassified as a high importance variable (Tables S1–S4) by contrasting the results of several techniques simultaneously, since all models have limitations and they each have a degree of uncertainty (Antúnez et al. 2017a). For example, a correlative model does not take into account variable interactions or human activities that are increasingly common (Hunter 2007; Crowther et al. 2015). In addition, the level of incidence that human activities have on each species (in quantitative terms) is unknown, making it difficult to include them in the models with the appropriate weighting.

Likewise, there are other factors not taken into account, such as the properties of the soil and other physiographic variables (Berg and Smalla 2009; Webb and Peart 2000) and the presence/absence of invasive plants, which in many cases also play an adverse role for native populations in danger of extinction (Mooney and Cleland 2001; Traveset and Richardson 2006), although the intensity of affectation varies by species (Gurevitch and Padilla 2004). Furthermore, the presence or absence of factors and elements can enhance or inhibit the impact of a variable and cause a chain reaction, like a change in the regional pluviometric regime as a result of fluctuations in continental precipitation (Watson and Albritton 2001).

At present, Mexico is home to globally significant biodiversity (Sarukhán et al. 2015). To sustain this biodiversity in the long term, the findings reported here could contribute to forming a scientific basis for strengthening the conservation actions and strategies implemented by the National Commission of Natural Protected Areas of Mexico. The inclusion of more taxa, listed in NOM-059 (2010) and IUCN red list, in the Action Programs for the Conservation of Species (PACE) is a pertinent action strategy, prioritizing those that are sensitive to extreme temperature variations, as they are most at risk in the face of climate change (Mo et al. 2019). For example, A. graveolens, A. germinans, V. lundellii, L. schottii and Guaiacum coulteri all showed signs of high sensitivity to changes in the average temperature in both the coldest month and the warmest month (Table S4).

Current action programs, in particular the Program for the Conservation of Species at Risk (PROCER), might also consider aspects like the most relevant variables based on the values or indices of importance, the types of associations (increasing, decreasing, linear or curvilinear), and environmental values where the maximum probability of abundance occurs (Antúnez et al. 2017b; Antúnez 2021) to characterize and delineate current habitats and identify places susceptible to be occupied in the future. In general, the identification of the most important variables for the abundance of plants classified in some risk category is an important step in defining their actual environmental tolerance (Table S1, S2, S4) and could be used as a preliminary indicator of species' sensitivity to climate change (Hannah et al. 2002).

Conclusions

Our findings reveal that, the most important variables for most species, regardless of the type of forest they belong to—and excluding altitudes—are the amount of precipitation in specific periods (mainly winter precipitation, growing season precipitation, and mean annual precipitation), the average slope of the terrain, and the dominant geographical aspect. We observed a significant change in the order of appearance of the variables (in their descending order), according to the values of importance of each method, but without a clear and conclusive pattern for each vegetation type, was because of the low number of species analyzed for some vegetation types (temperate forest and dry climate). Therefore, continuing and future efforts to find, conserve or enable suitable habitats (natural or artificial) for these species should focus on the variables with strong evidence of an impact on each species, taking into account both the analyses from a multiple perspective (Tables S1, S3) and the univariate results (Table S4).