Predicting biodiversity richness in rapidly changing landscapes: climate, low human pressure or protection as salvation?

Rates of biodiversity loss in Southeast Asia are among the highest in the world, and the Indo-Burma and South-Central China Biodiversity Hotspots rank among the world’s most threatened. Developing robust multi-species conservation models is critical for stemming biodiversity loss both here and globally. We used a large and geographically extensive remote-camera survey and multi-scale, multivariate optimization species distribution modelling to investigate the factors driving biodiversity across these two adjoining biodiversity hotspots. Four major findings emerged from the work. (i) We identified clear spatial patterns of species richness, with two main biodiverse centres in the Thai-Malay Peninsula and in the mountainous region of Southwest China. (ii) Carnivores in particular, and large ungulates to a lesser degree, were the strongest indicators of species richness. (iii) Climate had the largest effect on biodiversity, followed by protected status and human footprint. (iv) Gap analysis between the biodiversity model and the current system of protected areas revealed that the majority of areas supporting the highest predicted biodiversity are not protected. Our results highlighted several key locations that should be prioritized for expanding the protected area network to maximize conservation effectiveness. We demonstrated the importance of switching from single-species to multi-species approaches to highlight areas of high priority for biodiversity conservation. In addition, since these areas mostly occur over multiple countries, we also advocate for a paradigmatic focus on transboundary conservation planning.


Introduction
Originally, 25 biodiversity hotspots were identified globally, collectively supporting 44% and 35% of the world's vascular plants and terrestrial vertebrates, respectively, in an area equal to 1.4% of the Earth's land surface (Myers et al. 2000). Among these hotspots, two important ones are the Indo-Burma and South-Central China Biodiversity Hotspots (Fig. 1). Together, they form a continuous area of almost 3,000,000 km 2 , encompassing much of Southeast Asia. This region is home to 10,500 endemic plants and 706 endemic vertebrates, representing 3.5% and 2.6% of global vascular plants and vertebrates, respectively (Myers et al. 2000). To these are added also the Sundaland Biodiversity Hotspot, only partially included in mainland Southeast Asia, and the Himalaya Biodiversity Hotspot, only later highlighted as a hotspot (Mittermeier et al. 2004).
In addition to being the most biodiverse hotspots, these four are also among the most threatened. The biological richness of Southeast Asia is endangered by a suite of anthropogenic threats, including land conversion and over-exploitation of natural resources, which are drastically modifying regional landscapes (Cushman et al. 2017;Hughes 2017b), driving some of the highest rates of habitat loss globally (Gaveau et al. 2016;Miettinen et al. 2011). Among the most impactful causes of land conversion in the region are deforestation for monocultures (Azhar et al. 2017), urbanisation (Zhao et al. 2018) and ancillary enterprises related to human activities, such as road development (Kaszta et al. 2020a). Habitat loss is reducing the extent, quality and accessibility of suitable habitats Fig. 1 Map of the study area, showing the camera trap locations and the Biodiversity Hotspots encompassed: Indo-Burma (a) and South-Central China (b). In addition, the Himalaya (c) is shown, an area recognised subsequently as a hotspot (Mittermeier et al. 2004), and the Sundaland (d), only partially included in the study area. The polygons delineating the biodiversity hotspots have been downloaded from the Critical Ecosystem Partnership Fund (CEPF)'s website (Hoffman et al. 2016) (Fahrig 1997;Hearn et al. 2018), and 24-63% of the regional terrestrial endemic species will likely become extinct by 2100 if current rates of habitat loss continue (Sodhi and Brook 2006).
Traditionally, conservation strategies have focused on single-species in local geographic areas, resulting in a limited ability to optimize multi-species conservation across ecosystems and management boundaries (Ruter et al. 2014). The extent of biodiversity loss throughout Southeast Asia stresses the importance of shifting conservation focus towards multi-species, transboundary scope (Lim 2016).
We present a framework for characterizing broad-scale, multi-species community assemblage drivers and predicting spatial patterns in biodiversity. Broad-scale biodiversity patterns are influenced by anthropogenic, environmental and management factors: biodiversity might be highest (a) where climatic factors create favourable environmental conditions for diverse species (Pearson and Dawson 2003), (b) where low levels of human pressure occur (Di Marco et al. 2013), (c) where higher levels of protection create safe havens (Bruner et al. 2001), or (d) where interactions between the aforementioned factors occur. Disentangling the drivers of biodiversity richness is essential for developing efficient regional, multi-species conservation planning. Moreover, describing species distribution is a foundation for prioritising and planning conservation strategies, providing a means to predict future changes in biodiversity and guiding optimization of conservation and development scenarios.
We addressed four interrelated goals. First, we investigated the factors driving species presence using multi-scale, multivariate modelling. Second, we predicted and mapped species richness across the region, and performed a gap analysis between the biodiversity model and the network of protected areas to highlight the hotspots lacking protection. Third, we determined the species whose presence was most strongly correlated to overall biodiversity and have an important role as indicators of species richness. Lastly, we employed statistical approaches to determine to what degree climate (hypothesis a), accessibility to human activities (hypothesis b), protected area status (hypothesis c), or their interactions (hypothesis d) explain modelled patterns of biodiversity. This framework provides a model for understanding patterns and drivers of biodiversity, and for prioritizing conservation areas for multiple species across transboundary extents using consistent and objective methods.

Study area
The study area extends between 0°and 30°N latitude and 80°E-120°E longitude, and covers a broad altitudinal range, spanning coastal areas to Himalayan peaks above 8000 masl (Fig. 1). The area encompasses all or part of 13 countries: Bangladesh, Bhutan, Cambodia, Laos, Myanmar, Nepal, Singapore, Taiwan, Thailand and Vietnam fall entirely within the study area, while China, India and Malaysia are partially encompassed.
Following the Köppen-Geiger climate classification (Beck et al. 2018), most of the study area falls within the tropical and humid-subtropical climatic zones, although the northernmost areas lie in the subtropical highland climatic zone, and the highest Himalayan regions in the tundra and subarctic climatic zones. The climatic diversity is reflected in the variety of ecosystems of the region, which includes nine of the fourteen biomes highlighted by Olson et al. (2001). In turn, the richness of ecosystems and habitats is associated with the aforementioned uniquely rich biodiversity that characterise the area.
Southeast Asia, however, has also been characterised, for at least the last 70 years, by intense human population growth and anthropogenic landscape change. The total human population of the countries encompassed in the region (excluding China and India) grew from 130 million people in 1950 to more than 500 million presently (United Nations 2019). Furthermore, the population is expected to grow for at least the next 30 years. Associated with this striking human population growth, Southeast Asia has experienced severe rates of forest loss, among the highest globally (Sodhi and Brook 2006). Overall, the countries encompassed in the study area (excluding China and India, but including Malaysia), have lost more than 25 Mha of forest from 2001 to 2019, with the greatest losses during this period being 28% for Malaysia (including the insular states) and 26% for Cambodia (Global Forest Watch 2020;Hansen et al. 2013).

Data collection
Between 2008 and 2016, our field teams carried out systematic camera trap surveys across Southeast Asia, covering the full mainland range of clouded leopard (Neofelis nebulosa). Sampling occurred mainly in national parks and reserves, and spanned a broad altitudinal range, from 45 masl in Pang Sida National Park, in Thailand, to 3901 masl in Langtang National Park, in Nepal. Camera traps were set 1.0-2.0 km apart, with two cameras per station at * 40 cm above the ground, and were deployed along forest trails, natural ridgelines and disused logging roads to maximize detection success of large felids . Nevertheless, the sampling protocol also captured a rich dataset of regional biodiversity. All terrestrial mammals, birds and reptiles whose species was unambiguously identified were included in the analysis. When a species could not be clearly identified, we retained the data at a broader taxonomic level (i.e., order, family or genus). In addition, we also included data related to human activities captured by camera traps, such as people and domestic animals. These have been used solely for assessing the habitat factors driving species assemblages, as the influence of anthropic disturbances on habitat selection by terrestrial species is clearly relevant, but were not incorporated in models of biodiversity distribution. For each species, we used the number of detections per camera trap station, applying a filtering method to ensure the independence of data and to reduce overestimation bias: we discounted records of the same species at the same camera trap station within 1 h, except when animals were individually recognizable and when genders and/or age classes were unambiguous.

Landscape covariates
We selected a preliminary set of 28 covariates covering a broad range of habitat gradients to investigate habitat requirements for the sampled species (Hughes 2017a). We included twelve landscape, four anthropic, three topographic, one climatic and eight spatial covariates. To investigate more biologically meaningful derivatives of the original covariates, they were transformed into 46 covariates by applying composition (i.e., class proportion on the landscape) and configuration (i.e., landscape continuity) metrics using FRAGSTATS (McGarigal et al. 2012) (Table S1).
We obtained the original raster layers of the preliminary covariates from different sources and therefore they had different spatial resolutions and projections. To harmonise the raster layers, we followed the framework recently used in our work (Macdonald et al. 2019), where we used a similar set of covariates. We first re-projected all the layers to Asia South Albers Equal Area Conic projection in ArcMap v10.6.1, by applying a nearest neighbour re-sampling technique for discrete layers and a bilinear interpolation re-sampling technique for continuous layers. Then, by applying the same re-sampling techniques, we re-sampled all raster layers to 250 m resolution, a resolution commonly used to model habitat suitability and biodiversity hotspots at continental and global extents (Rondinini et al. 2011).
Species select their environmental resources and conditions at different spatial scales (Macdonald et al. , 2019McGarigal et al. 2016). To investigate scalar relationships between sampled species and covariates, we calculated each metric at eight different scales, by using circular buffers of 250 m, 500 m, 1000 m, 2000 m, 4000 m, 8000 m, 16,000 m and 32,000 m radii, centred on each camera trap location.

Covariate selection and variance partitioning
Since zero-inflation of explanatory variables are likely to cause inaccurate parameter estimations and unreliable inferences (Martin et al. 2005), we removed poorly sampled covariates occurring at \ 10% of camera stations to avoid assessing unrepresentative habitat features. To investigate the most representative scales for sampled species, we performed Canonical Correspondence Analysis (CCA) (McGarigal et al. 2000;ter Braak 1986) independently at each scale for each covariate, using the vegan package (Oksanen et al. 2018) in R v3.5.1 (R Core Team 2018). For each covariate, we retained the scale whose univariate CCA showed the highest canonical eigenvalue (Borcard et al. 1992).
We then assessed multicollinearity by calculating Pearson's correlation coefficient between all covariate pairs. When two covariates were highly correlated (|r|C 0.7), we dropped the covariate whose univariate CCA showed the lowest adjusted-R 2 (Guisan and Zimmermann 2000). We selected the final set of covariates by performing forward selection for each group of covariates, retaining only the significant ones (p \ 0.001) (Cushman and McGarigal 2004).
It should be noted that the preliminary covariates were selected to represent as wide habitat gradients as necessary to evaluate biodiversity distribution. The additional steps to assess composition and configuration metrics of the original covariates, as well as to evaluate their representative spatial scales, were performed to analyse how, and at what scale, habitat factors that we had already identified as fundamental for biodiversity, affected its geographic distribution.
To investigate the relative contribution of each group of covariates, we performed a variance partitioning analysis (Borcard et al. 1992) using the vegan package. Variance partitioning quantifies the independent contribution of each group of covariates to the global variance explained, as well as the shared variance explained by interacting combinations of covariates (Borcard et al. 1992;Cushman and McGarigal 2004).

Modelling species richness
Since our data were counts of species detected at each camera station, to model species richness we performed Poisson generalized linear model (GLM) for each species using covariates at their representative spatial scales, in R v3.5.1. GLMs are commonly used regression models that allow the response variable to have different distributions than the normal one (Guisan and Zimmermann 2000), and Poisson distribution is used when the response variable is composed of abundance data (Vincent and Haworth 1983), as in our case. Projected models were reclassified to binary form, with zero and negative value pixels treated as absences and pixels with positive values as presences. Finally, singlespecies presence-absence maps were summed to predict species richness (Grand et al. 2004).
To evaluate the performance of the multi-species model, we trained the models with 80% of camera trap stations, and used the remaining 20% for validation. The multi-species model was validated by performing GLM between the modelled and the empirical number of species sampled at camera stations, and calculating the Nagelkerke-pseudo-R 2 . The Nagelkerke-pseudo-R 2 (Nagelkerke 1991) is an index, ranging from 0 to 1, that provides a measure of the goodness-of-fit of logistic regressions. It is important to specify that, differently from linear regressions for which R 2 is a real measure of the goodness-of-fit that calculates models' explained variance, for logistic regressions a similar measure does not exist. However, pseudo-R 2 is a relative measure of how well a model explains the data, and can be used to compare different models. Additionally, we compared the performance of the multi-species model with a ''null'' model obtained by summing the IUCN geographic range layers of the sampled species (IUCN 2019). The range layers considered were solely the polygons of the extent of occurrence (EOO) in which the species were considered extant and resident. We performed GLM between the number of species predicted by the IUCN model and the empirical number of species at camera stations, and calculated the Nagelkerke-pseudo-R 2 .

Gap analysis and species importance
We quantified the amount of protected biodiversity by calculating the ratio between the cumulative number of species predicted within protected areas (i.e., the number of species obtained by summing the number of species predicted within each pixel encompassed within protected areas) and the cumulative number of species predicted in the study area (i.e., the number of species obtained by summing the number of species predicted within each pixel in the study area) (Grand et al. 2004). Hence, we evaluated the effectiveness of protected areas and highlighted where additional ones should be implemented to fill the gaps in habitat protection.
We assessed the importance of sampled species as indicators of biodiversity by performing GLMs independently for each species. Predictor and response variables were random samples of points, in number proportional to the pixels representing the study area (1:1000), selected from the species richness model and from each single-species presenceabsence model, respectively. We investigated deviance explained by each model and ranked each species according to its ability to predict overall biodiversity.

Drivers of biodiversity patterns
To evaluate how well our model predicted empirical species richness for each country, we performed GLM between the modelled and empirical number of species at the camera traps used for validation. Then, using the residual values of each location, we performed ANOVA to test for significant differences between countries, followed by Tukey's test to assess which countries were significantly different from the others.
We assessed the drivers of predicted biodiversity patterns by sampling 10,000 random points from the species richness model and from covariate layers to evaluate our four hypotheses (i.e., climatic, low human pressure, protected status and interactions between them). The tested covariates were mean annual temperature and mean annual precipitation (Fick and Hijmans 2017) for the climatic hypothesis, human footprint (WCS and CIESIN 2005) and roughness for the anthropic hypothesis, and protected areas (IUCN and UNEP-WCMC 2017) for the management hypothesis. Roughness layer was derived from a digital elevation model (Jarvis et al. 2008) and obtained by applying geomorphometric transformations (Evans et al. 2014), and was considered as a proxy of landscape inaccessibility to human activities (Cushman et al. 2017). We followed the same procedure we employed to model species richness, to re-project and re-sample the layers. Next, we performed linear model (LM) to assess the ecological relationships between modelled species richness and the aforementioned covariates. Last, we assessed changes in predicted species richness as functions of increasing each explanatory variable from the 10th to the 100th percentile, while holding all other covariates at their median value, illustrating which covariate had the strongest effect in driving biodiversity richness (Wasserman et al. 2012).

Sampling effort
Camera traps were deployed at 1384 camera trap stations in 15 landscapes across 7 countries in Southeast Asia, yielding a combined sampling effort of 115,389 trap nights (average trap nights per camera trap station = 83.4 ± 2.6 SE). We sampled 90 species, of which 74 were mammals, 15 terrestrial birds and 1 reptile. In addition, we also sampled 4 anthropic elements: forest guards, including all rangers and patrollers in protected areas, humans (all other people excluding park staff), domestic dogs and domestic cattle. We assessed the effect of these anthropic elements on the habitat selection of sampled species, but we discounted them when we modelled species richness. The maximum numbers of species sampled per station were 15 in Cambodia, 15 in India, 17 in Laos, 14 in Myanmar, 13 in Nepal, 21 in Peninsular Malaysia and 33 in Thailand (Table 1). The list of sampled species is provided in Table S2, and the number of species sampled per country is reported in Table S3.

Univariate scaling and forward selection analysis
Scale optimization revealed relatively high homogeneity in the scales selected across predictor variables. The broadest scale (32,000 m) had the strongest relationship with species occurrence patterns for 26 covariates (68.4%). Overall, broad scales (16,000 m and 32,000 m) were selected for 32 covariates (84.2%), while medium scales (2000 m, 4000 m and 8000 m) and fine scales (250 m, 500 m and 1000 m) were both selected for two covariates (5.3%) ( Table 2).
Two covariates were poorly represented and were excluded due to lack of predictive power. We also dropped 33 highly correlated covariates. We then applied forward selection to identify the significant covariates from each group. The stringent threshold did not lead to the removal of any covariates. The final set was composed of 11 covariates (Table 2).

Variance partitioning and species-habitat relationships
The covariates selected for the CCA explained 32.5% of the variance in species detections. Camera effort had the strongest relationship (78% of the explained variance individually and 80% conditionally), followed by landscape covariates (11% individually and 18% conditionally). Topographic and spatial covariates each explained individually only 1% of the variance (Fig. 2).
The CCA highlighted that species assemblages were primarily driven by environmental components (protected areas, mosaic areas, non-forested areas, shrublands and grasslands, water and forest cover). However, topographic (elevation) and spatial (Y) gradients also had substantial relationships with assemblage patterns (Fig. 3, Table S4).
Analysis of the biplot revealed assemblages of species associated, at varying degrees, with specific habitat features (Fig. 3). A conspicuous group was clearly driven by elevational gradient, differentiating low-elevation and high-altitude species. Another group of species was mainly associated with forests, mosaic areas and water, while a more heterogeneous group of species was influenced by slope position and non-forested areas.

Single-species models
Single-species models showed that most species were positively associated with forests (GYRATE_AM and PLAND), protected areas, shrublands and grasslands and non-forested   (Table S5).

Model comparison, gap analysis and species importance
We identified the factors associated with species richness, and used them to predict biodiversity across the study area (Fig. 4). We validated the multi-species model using independent test data, showing a positive model fit (Nagelkerke-pseudo-R 2 = 0.14, p \ 0.0001) (Fig. S1). The IUCN model showed a much lower model fit (Nagelkerkepseudo-R 2 = 0.0017, p = 0.50) (Fig. S2) and failed to highlight known biodiversity hotspots (Figs. S3-S4). Covariates excluded from the model because occurring in \ 10% of the camera trap stations * Covariates excluded from the model after the multicollinearity analysis The gap analysis revealed that 18.6% of the predicted biodiversity occurred within protected areas. The countries showing the lowest proportions of protected biodiversity were Singapore, Bangladesh and India, preserving 6.51%, 12.23% and 12.93% of their biodiversity, respectively (Table S6). Surprisingly, even the most effective protected areas preserved only a small percentage of biodiversity, with the top three protecting, respectively, 1.37%, 1.35% and 0.84% of the total cumulative number of species, and 7.38%, 7.27% and 4.52% of the cumulative number of species occurring within protected areas (Table S7). In addition, we identified biodiverse areas that are not protected, and whose preservation would dramatically increase the amount of protected biodiversity (Fig. 4).
The biodiversity indicator analysis revealed that species richness was mainly predicted by carnivores, with ten carnivores occurring among the first twenty indicator species. These included members of the Viverridae, Herpestidae, Canidae, Ursidae, Mustelidae and Felidae families, with Viverridae as the most predictive indicator of overall biodiversity. The role of artiodactyls was also noteworthy, with three Bovidae, two Cervidae and mouse-deer species occurring in the first twenty positions. Among the most informative indicators were also rat species, two gallinaceous birds and one reptile (Table S8).

Drivers of biodiversity patterns
ANOVA on the residuals of the GLM between the modelled and the empirical number of species showed a significant difference among countries (F (6, 270) = 72.5, p \ 0.0001). Specifically, Peninsular Malaysia showed positive distribution of the residuals, meaning that the country has more species richness than predicted, given its environmental conditions (Fig. S5). Contrarily, Nepal, India and, to a lesser degree, Laos and Myanmar, showed negative residuals, meaning that our model overestimated the number of species in these countries. Additionally, Tukey's test revealed significant differences between almost  Table S2 for species' codes all the country dyads, with the exception of Thailand-Cambodia and Myanmar-Laos (Table S9).
We found that climatic covariates contributed the most to the predicted patterns of species richness. Mean annual temperature was negatively associated with species richness, although it showed nonlinear relationship (Fig. 5), while precipitation showed a positive relationship (Fig. 6). Protected areas also showed positive relationship with biodiversity. Anthropic factors showed negative but weaker effect on patterns of biodiversity than both climatic factors and protected status (Table 3).
Plotting changes in species richness as function of increasing values of explanatory covariates from the 10th to the 100th percentile corroborated the importance of protected status, and the negative impact of accessibility for anthropic activities on biodiversity (Fig. 7).

Discussion
We provide new insights into spatial patterns and drivers of species richness in Southeast Asia, highlighting that biodiversity is not evenly distributed across the region, but exhibits a complex pattern with two large primary biodiverse areas: the Thai-Malay Peninsula (TMP) and the mountainous region of Southwest China (MSC), with a scattering of smaller biodiverse areas between them. Hughes (2017a) described similar pattern, producing distinct models for different taxa in Southeast Asia, although the MSC was highlighted as hotspot only in the birds' model. However, it is likely that a more taxonomically inclusive model would have corresponded more closely with ours.  Table S7 The habitat features of the two main hotspots are distinct, leading to unique communities of species. Yet both have in common landscapes characterized by high environmental and climatic heterogeneity. In the TMP, the highly productive, wet and stable tropical rainforest climate, coupled with strong topographic heterogeneity along   (Wikramanayake 2002). Similarly, the MSC environment is characterized by ridge-valley complexes shaped by large rivers, and divided by tall mountain ranges. This landscape is characterized by high environmental and climatic heterogeneity, promoting ecological niche diversity, and supporting a large variety of species in a relatively small area (Wikramanayake 2002).

Species-habitat relationships
The multi-scale optimization approach revealed broad-scale selection for almost all covariates. Given the vastness of our study area, this result likely reflects the fact that habitat loss, poaching pressure and other human perturbations are broad-scale and that remaining areas of high biodiversity are in large remnant ecosystems. However, this result might also be due to an insufficiency of the spatial scales assessed in the multi-scale  Jackson and Fahrig (2015) if, in a multi-scale assessment, a species selects most of the habitat factors at the largest or at the smallest scale, this might be symptomatic of a limitation in the number, and in the magnitude, of the scales assessed. In other words, the true representative scale could be outside the range of evaluated ones. While acknowledging this possibility, insofar as it might be a potential limitation of our study, it can arise only if the true, but undetected, representative scales are broader than those we report, which extended up to 32 km. It is very unlikely that finer un-sampled scales were important, given the scale range we analysed. The main implication of our results is that species richness is mostly related to environmental and anthropic features at very broad scales. Southeast Asian environmental features have a predominant influence on species richness. Presence of forests and water are particularly important in explaining species presence, with both composition (habitat percentage) and configuration (aggregation index) metrics retained in the final model. Variance partitioning supported these results, showing that model's variance was mainly explained by environmental covariates, while topographic and spatial factors only marginally contributed to explain species assemblages.
The CCA biplot elucidates the relationships of individual species with habitat features, and their contribution to shaping communities. Forest cover, non-forested areas, shrublands and grasslands, mosaic areas and water showed the highest eigenvalues. However, Y (latitude) and elevation also showed high values revealing their importance in shaping the distribution of some species, even with limited weights in the variance partitioning. Indeed, the biplot (Fig. 3) clearly highlights that a number of species follow an elevational and latitudinal gradient. Additionally, two other distinct groups emerged: the first, on the left side of the biplot, driven mostly by forest cover, mosaic areas and water, and including species associated with forests such as banded palm civet (Hemigalus derbyanus), banded linsang (Prionodon linsang), Malay tapir (Tapirus indicus) and Southern pig-tailed macaque (Macaca nemestrina), and the second, on the bottom right corner of the biplot, driven by presence of protected areas, slope position and non-forested areas, and composed of a more heterogeneous group of species including golden jackal (Canis aureus), tiger (Panthera tigris), Asian black bear (Ursus thibetanus) and Asian elephant (Elephas maximus).
Importantly, most of the species in this latter group are known to be highly sensitive to forest loss and anthropogenic killing. Their association with low elevation suggests their preference for warm tropical habitats. However, the joint association with non-forested areas indicates that very little extensive native forests persists at low elevations, and that it is concentrated in protected areas.

Biodiversity indicators
The use of single indicator taxa to prioritise management actions has long been employed in conservation (Noss 1990;Rodrigues and Brooks 2007), although the assumption that preserving the habitat for a taxon would automatically guarantee protection for other sympatric species has been contested, and its effectiveness is debated . We found that carnivores are a good surrogate of sampled terrestrial biodiversity, with several species from a variety of families being potential indicators. The role of carnivores as indicators of the sampled biodiversity likely reflects their role as umbrella species. Specifically, these species are among the most sensitive to habitat loss, mortality risk and other perturbations due to their large size, high vulnerability and large home ranges. Therefore, their occurrence in specific areas suggests that site has high ecological integrity and low levels of perturbation, associated with high richness in other species. Our results are in line with previous findings on the importance of carnivores as indicator and umbrella species (Carroll et al. 2001;Dalerum et al. 2008), and on the potential role of large ungulates and birds as indicators (Larsen et al. 2012).
However, we agree that prioritisation strategies based on single taxa might fail to highlight important conservation areas for rare or cryptic species. Hence, we advocate the use of a combination of indicators to cover broader ranges of habitat gradients, ensuring the protection of a larger number of species, as recommended for conservation planning (Carroll et al. 2001;Macdonald et al. 2012) and for landscape connectivity (Cushman and Landguth 2012).

Model performance
The model validation revealed our species richness model to be a relatively good predictor of biodiversity, capturing the high species richness of the MSC, a known biodiversity hotspot (Myers et al. 2000). Considering that the camera survey was largely carried out in the TMP, this congruence adds confidence that our model successfully predicted biodiversity patterns across the region. We recommend future studies to undertake data collection in the MSC to test our predictions for that region.
Comparing our model with current knowledge of species distribution based on IUCN geographic range layers, revealed that our model performed much better based on the Nagelkerke-pseudo-R 2 . The IUCN model, in addition to misclassifying the empirical species richness showing lower Nagelkerke-pseudo-R 2 , failed to highlight some of the important biodiversity hotspots, such as the MSC, and classified as hotspot less biodiverse areas. Similar discrepancies were also highlighted by Hughes (2017a). This lack of congruence raises concerns, as the IUCN layers are often the main source to produce multispecies assessments (Rondinini et al. 2011). We argue that models based on extensive empirical data and robust analytical approaches are more accurate (Di Marco et al. 2017), and that continued investment in direct biodiversity monitoring is essential to collect reliable empirical data .
Comparing our model with the IUCN model, we found that the region between the Chinese province of Yunnan, Central Myanmar and East India, differed the most (Fig. S4). Here the IUCN model predicted more species than ours. The MSC also showed strong discrepancy, with our model predicting more species than the IUCN one. This suggests that other factors, likely very high past deforestation, are responsible for the lower species richness in these regions.

Drivers of biodiversity pattern
We found significant differences between countries in the residuals of predicted biodiversity, suggesting potential geopolitical causes behind these discrepancies. Peninsular Malaysia show higher observed biodiversity than predicted by our model, while Nepal and India have lower observed biodiversity. This suggests that differences in factors distinct than those we evaluated here are also critical in driving regional biodiversity. Specifically, it is likely that Peninsular Malaysia has higher biodiversity partially due to better governance, rule of law or enforcement, while the overestimation of biodiversity in Nepal and India might reflect greater ineffectiveness in management and enforcement. We recommend future work to highlight the geopolitical factors affecting species richness in these countries.
Climatic factors were revealed as the main drivers of the overall spatial pattern. Temperature and precipitation were strongly associated with predicted biodiversity, the former showing a negative, nonlinear, relationship (Fig. 5), and the latter with a positive effect (Fig. 6). We suggest these results to be due to the importance of the MSC in our model, where the lowest temperatures across the study area are recorded. Protected status also have strong positive effect, as demonstrated by the quantile plot showing that changes in protection have the largest impact on increasing species richness (Fig. 7).
Hence, the current pattern of biodiversity is the result of the combination of climatic and, consequently, environmental conditions, with protection status. Human footprint and topographic roughness were associated with the largest decreases in modelled species richness, indicating strong reduction of biodiversity in areas highly accessible to human activities. Overall, these results show a complex pattern of environmental and anthropic factors driving local patterns of species richness, within broader gradients defined by climatic conditions.

Gap analysis and conservation recommendations
We revealed poor match between existing protected areas and regions of predicted high biodiversity, suggests two conservation imperatives: (1) to extend the protected areas already existing, as well as the corridors linking them (Cushman et al. 2018;Kaszta et al. 2020b), and (2) to encourage cultural tolerance of coexistence and an ethic of wildlife protection (Western et al. 2019).
In line with classic biogeographic thinking, it is desirable to expand the protected areas network into a vaster, more connected system. One of the main hotspots, the MSC, has very limited coverage by protected areas. These should be extended, particularly in Southern Yunnan, including the Indian state of Assam and northern Myanmar, and designed to provide stepping-stones along the major connectivity corridors of the region (Kaszta et al. 2020b).
Biodiversity in the southern extent of the study area is also most lacking in protection, despite its importance. Almost none of the predicted highly biodiverse areas in Cambodia fell within protected areas. We also identified several critical areas occurring across countries, underlining the importance of transboundary conservation programmes (Kaszta et al. 2020b). The biodiverse region spanning the border between Vietnam and Laos is only partially protected. Here, the Vietnamese system should be expanded to cover broader portions, especially in the south where we predicted large biodiversity areas but protection is inadequate and human impact is high. Myanmar shares a hotspot with Thailand, but only the Thai side is substantially protected. Similarly, the western coast of Myanmar adjoining Bangladesh is mostly unprotected despite its very high biodiversity. These areas are also identified as the most important nodes in a regional connectivity network for clouded leopard (Kaszta et al. 2020b). Finally, Peninsular Malaysia has the highest levels of species richness, but only a small portion is protected, despite its vulnerability to isolation from other areas of species richness due to its long, thin and vulnerable corridors (Kaszta et al. 2020b).

Conclusions
Implementing and managing protected areas is not easy, especially over an area as vast as the one we assessed. In addition, the preservation of a high number of species as a consequence of their occurrence within protected areas is not a foregone conclusion (Hallmann et al. 2017). In recommending an expansion of the protected areas network we are mindful of the hazard of creating paper parks if protection does not advance hand in hand with enforcement, enhanced willingness to embrace co-existence, adequacy of finances, all set in the framework of regional, transboundary, conservation strategies that are often either unthought-out or unfulfilled (Di Minin and Toivonen 2015;Western et al. 2019). Our finding that only a small portion of biodiversity is protected, prompts us to prioritise thinking, and action, about both expanding that network and identifying a wider portfolio of complementary conservation strategies. In a region like Southeast Asia, where over-exploitation of natural resources and deforestation are the main drivers of biodiversity loss (Sodhi and Brook 2006;Sodhi et al. 2004), we argue for a holistic, integrated approach between conservationists, local stakeholders and governmental institutions, supported by trans-disciplinary evidence and insight (Macdonald et al. 2019). Only through consideration of the well-being of both wildlife and people, at the intersection of conservation and social justice, can we be hopeful for co-existence (Vucetich et al. 2018).
As argued by Sodhi and Brook (2006), a route to tangible conservation outcomes in Southeast Asia is via the assessment and integration of social issues with conservation actions. In addition, among the main components likely to encourage coexistence are perceptions of benefits and costs, and rectifying both real and imagined imbalances between them (Dickman et al. 2011;Western et al. 2019). Hence, although providing local, fine-scale, socio-economic recommendations to foster conservation practices in Southeast Asia goes beyond the scope of this paper, overall our results should be framed by the wider social and geopolitical circumstances of the region, and we argue for conservation interventions that fully embrace the holism of human-wildlife coexistence: striving, not only for the conservation of species and habitats, but also for the economic benefits, equity, rights and active participation for people depending on these areas (Chazdon et al. 2009).
Here we implemented a multi-scale, multi-species modelling framework to reveal the areas richest in biodiversity in Southeast Asia. We demonstrated the importance of undergoing a paradigmatic switch from single-species to multi-species models to better understand conservation priorities. We hope that our results will contribute to conservation of terrestrial vertebrates in the region, highlighting where conservation actions should be prioritised. Further, the modelling framework that we illustrated here could usefully be implemented in other areas of the world, to highlight biodiversity hotspots and to identify conservation gaps with greater precision that has been possible hitherto. Finally, based on the evidences reported here, we make a strident call for transboundary conservation planning to protect natural habitats from the imminent risks of degradation, and for strategic efforts to identify and prioritise core habitats and connectivity linkages among them at broad, regional scales. Conservation, and Freeland Foundation. For Malaysia, we thank the Department of Wildlife and National Parks, and Malaysia's Economic Plan Unit for the research permits and continuous support. For Laos, we thank the WCS-Laos, and the Nam Et-Phou Louey National Protected Area for their crewmembers and generous in-kind support. For Cambodia, we thank WWF Cambodia, Fauna and Flora International, and the Ministry of Environment. For Nepal, we thank the Department of National Parks and Wildlife Conservation of Nepal for granting research permission, Langtang National Park administration and staff, and Bear Research and Conservation Nepal. DWM also thanks World Animal Protection for a grant to support research in Langtang National Park.
Author contributions The larger framework on which this study is based was conceived by DWM, who designed the overall field plan, directed the work and secured the funding. Individual field teams, led by other co-authors, deployed the cameras, gathered and collated the data. LC, in collaboration with SAC, ZK, HMB and DWM, led the analysis, and these authors also led on drafting the manuscript. All authors participated in, and approved, the completion of the manuscript.
Funding The majority of the team, as well as the data, were part of the core WildCRU effort supported principally by a Robertson Foundation grant to DWM.
Availability of data and material Input GIS layers of the final covariates are available in Electronic Supplementary Material II. Given the extremely sensitive nature of species occurrence data with respect to illegal wildlife trade, locations of camera traps will not be made public to avoid further endangering the species. However, we welcome correspondence with scholars and conservationists regarding collaborative use of the data to advance science and conservation of Southeast Asian species.Code availability We welcome correspondence with scholar and conservationists regarding collaborative use of the codes applied.

Compliance with ethical standards
Conflicts of interest The authors declare that there is no conflict of interest.

Consent to participate
The authors agree to be part of the publication.

Consent for publication
The authors give their consent to publish the contents of the work.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. 1