Study area
The study area extends between 0° and 30° N latitude and 80° E–120° E longitude, and covers a broad altitudinal range, spanning coastal areas to Himalayan peaks above 8000 masl (Fig. 1). The area encompasses all or part of 13 countries: Bangladesh, Bhutan, Cambodia, Laos, Myanmar, Nepal, Singapore, Taiwan, Thailand and Vietnam fall entirely within the study area, while China, India and Malaysia are partially encompassed.
Following the Köppen–Geiger climate classification (Beck et al. 2018), most of the study area falls within the tropical and humid-subtropical climatic zones, although the northernmost areas lie in the subtropical highland climatic zone, and the highest Himalayan regions in the tundra and subarctic climatic zones. The climatic diversity is reflected in the variety of ecosystems of the region, which includes nine of the fourteen biomes highlighted by Olson et al. (2001). In turn, the richness of ecosystems and habitats is associated with the aforementioned uniquely rich biodiversity that characterise the area.
Southeast Asia, however, has also been characterised, for at least the last 70 years, by intense human population growth and anthropogenic landscape change. The total human population of the countries encompassed in the region (excluding China and India) grew from 130 million people in 1950 to more than 500 million presently (United Nations 2019). Furthermore, the population is expected to grow for at least the next 30 years. Associated with this striking human population growth, Southeast Asia has experienced severe rates of forest loss, among the highest globally (Sodhi and Brook 2006). Overall, the countries encompassed in the study area (excluding China and India, but including Malaysia), have lost more than 25 Mha of forest from 2001 to 2019, with the greatest losses during this period being 28% for Malaysia (including the insular states) and 26% for Cambodia (Global Forest Watch 2020; Hansen et al. 2013).
Data collection
Between 2008 and 2016, our field teams carried out systematic camera trap surveys across Southeast Asia, covering the full mainland range of clouded leopard (Neofelis nebulosa). Sampling occurred mainly in national parks and reserves, and spanned a broad altitudinal range, from 45 masl in Pang Sida National Park, in Thailand, to 3901 masl in Langtang National Park, in Nepal. Camera traps were set 1.0–2.0 km apart, with two cameras per station at ~ 40 cm above the ground, and were deployed along forest trails, natural ridgelines and disused logging roads to maximize detection success of large felids (Macdonald et al. 2018). Nevertheless, the sampling protocol also captured a rich dataset of regional biodiversity. All terrestrial mammals, birds and reptiles whose species was unambiguously identified were included in the analysis. When a species could not be clearly identified, we retained the data at a broader taxonomic level (i.e., order, family or genus). In addition, we also included data related to human activities captured by camera traps, such as people and domestic animals. These have been used solely for assessing the habitat factors driving species assemblages, as the influence of anthropic disturbances on habitat selection by terrestrial species is clearly relevant, but were not incorporated in models of biodiversity distribution. For each species, we used the number of detections per camera trap station, applying a filtering method to ensure the independence of data and to reduce overestimation bias: we discounted records of the same species at the same camera trap station within 1 h, except when animals were individually recognizable and when genders and/or age classes were unambiguous.
Landscape covariates
We selected a preliminary set of 28 covariates covering a broad range of habitat gradients to investigate habitat requirements for the sampled species (Hughes 2017a). We included twelve landscape, four anthropic, three topographic, one climatic and eight spatial covariates. To investigate more biologically meaningful derivatives of the original covariates, they were transformed into 46 covariates by applying composition (i.e., class proportion on the landscape) and configuration (i.e., landscape continuity) metrics using FRAGSTATS (McGarigal et al. 2012) (Table S1).
We obtained the original raster layers of the preliminary covariates from different sources and therefore they had different spatial resolutions and projections. To harmonise the raster layers, we followed the framework recently used in our work (Macdonald et al. 2019), where we used a similar set of covariates. We first re-projected all the layers to Asia South Albers Equal Area Conic projection in ArcMap v10.6.1, by applying a nearest neighbour re-sampling technique for discrete layers and a bilinear interpolation re-sampling technique for continuous layers. Then, by applying the same re-sampling techniques, we re-sampled all raster layers to 250 m resolution, a resolution commonly used to model habitat suitability and biodiversity hotspots at continental and global extents (Rondinini et al. 2011).
Species select their environmental resources and conditions at different spatial scales (Macdonald et al. 2018, 2019; McGarigal et al. 2016). To investigate scalar relationships between sampled species and covariates, we calculated each metric at eight different scales, by using circular buffers of 250 m, 500 m, 1000 m, 2000 m, 4000 m, 8000 m, 16,000 m and 32,000 m radii, centred on each camera trap location.
Covariate selection and variance partitioning
Since zero-inflation of explanatory variables are likely to cause inaccurate parameter estimations and unreliable inferences (Martin et al. 2005), we removed poorly sampled covariates occurring at < 10% of camera stations to avoid assessing unrepresentative habitat features. To investigate the most representative scales for sampled species, we performed Canonical Correspondence Analysis (CCA) (McGarigal et al. 2000; ter Braak 1986) independently at each scale for each covariate, using the vegan package (Oksanen et al. 2018) in R v3.5.1 (R Core Team 2018). For each covariate, we retained the scale whose univariate CCA showed the highest canonical eigenvalue (Borcard et al. 1992).
We then assessed multicollinearity by calculating Pearson’s correlation coefficient between all covariate pairs. When two covariates were highly correlated (|r|≥ 0.7), we dropped the covariate whose univariate CCA showed the lowest adjusted-R2 (Guisan and Zimmermann 2000). We selected the final set of covariates by performing forward selection for each group of covariates, retaining only the significant ones (p < 0.001) (Cushman and McGarigal 2004).
It should be noted that the preliminary covariates were selected to represent as wide habitat gradients as necessary to evaluate biodiversity distribution. The additional steps to assess composition and configuration metrics of the original covariates, as well as to evaluate their representative spatial scales, were performed to analyse how, and at what scale, habitat factors that we had already identified as fundamental for biodiversity, affected its geographic distribution.
To investigate the relative contribution of each group of covariates, we performed a variance partitioning analysis (Borcard et al. 1992) using the vegan package. Variance partitioning quantifies the independent contribution of each group of covariates to the global variance explained, as well as the shared variance explained by interacting combinations of covariates (Borcard et al. 1992; Cushman and McGarigal 2004).
Modelling species richness
Since our data were counts of species detected at each camera station, to model species richness we performed Poisson generalized linear model (GLM) for each species using covariates at their representative spatial scales, in R v3.5.1. GLMs are commonly used regression models that allow the response variable to have different distributions than the normal one (Guisan and Zimmermann 2000), and Poisson distribution is used when the response variable is composed of abundance data (Vincent and Haworth 1983), as in our case. Projected models were reclassified to binary form, with zero and negative value pixels treated as absences and pixels with positive values as presences. Finally, single-species presence-absence maps were summed to predict species richness (Grand et al. 2004).
To evaluate the performance of the multi-species model, we trained the models with 80% of camera trap stations, and used the remaining 20% for validation. The multi-species model was validated by performing GLM between the modelled and the empirical number of species sampled at camera stations, and calculating the Nagelkerke-pseudo-R2. The Nagelkerke-pseudo-R2 (Nagelkerke 1991) is an index, ranging from 0 to 1, that provides a measure of the goodness-of-fit of logistic regressions. It is important to specify that, differently from linear regressions for which R2 is a real measure of the goodness-of-fit that calculates models’ explained variance, for logistic regressions a similar measure does not exist. However, pseudo-R2 is a relative measure of how well a model explains the data, and can be used to compare different models. Additionally, we compared the performance of the multi-species model with a “null” model obtained by summing the IUCN geographic range layers of the sampled species (IUCN 2019). The range layers considered were solely the polygons of the extent of occurrence (EOO) in which the species were considered extant and resident. We performed GLM between the number of species predicted by the IUCN model and the empirical number of species at camera stations, and calculated the Nagelkerke-pseudo-R2.
Gap analysis and species importance
We quantified the amount of protected biodiversity by calculating the ratio between the cumulative number of species predicted within protected areas (i.e., the number of species obtained by summing the number of species predicted within each pixel encompassed within protected areas) and the cumulative number of species predicted in the study area (i.e., the number of species obtained by summing the number of species predicted within each pixel in the study area) (Grand et al. 2004). Hence, we evaluated the effectiveness of protected areas and highlighted where additional ones should be implemented to fill the gaps in habitat protection.
We assessed the importance of sampled species as indicators of biodiversity by performing GLMs independently for each species. Predictor and response variables were random samples of points, in number proportional to the pixels representing the study area (1:1000), selected from the species richness model and from each single-species presence-absence model, respectively. We investigated deviance explained by each model and ranked each species according to its ability to predict overall biodiversity.
Drivers of biodiversity patterns
To evaluate how well our model predicted empirical species richness for each country, we performed GLM between the modelled and empirical number of species at the camera traps used for validation. Then, using the residual values of each location, we performed ANOVA to test for significant differences between countries, followed by Tukey’s test to assess which countries were significantly different from the others.
We assessed the drivers of predicted biodiversity patterns by sampling 10,000 random points from the species richness model and from covariate layers to evaluate our four hypotheses (i.e., climatic, low human pressure, protected status and interactions between them). The tested covariates were mean annual temperature and mean annual precipitation (Fick and Hijmans 2017) for the climatic hypothesis, human footprint (WCS and CIESIN 2005) and roughness for the anthropic hypothesis, and protected areas (IUCN and UNEP-WCMC 2017) for the management hypothesis. Roughness layer was derived from a digital elevation model (Jarvis et al. 2008) and obtained by applying geomorphometric transformations (Evans et al. 2014), and was considered as a proxy of landscape inaccessibility to human activities (Cushman et al. 2017). We followed the same procedure we employed to model species richness, to re-project and re-sample the layers. Next, we performed linear model (LM) to assess the ecological relationships between modelled species richness and the aforementioned covariates. Last, we assessed changes in predicted species richness as functions of increasing each explanatory variable from the 10th to the 100th percentile, while holding all other covariates at their median value, illustrating which covariate had the strongest effect in driving biodiversity richness (Wasserman et al. 2012).