Study region and distribution data
We used range maps from the International Union for Conservation of Nature (IUCN) for all five Rhinopithecus species as current distribution data (IUCN 2014). We converted the IUCN polygons to 5 × 5 km grid cells and then to one point per cell (centre of grid). The range estimates were refined by excluding all points outside the species’ known elevation ranges, which are between 200 and 1200 m for R. avunculus (Xuan Canh et al. 2008), 570 and 2300 m for R. brelichi (Bleisch et al. 2008), 1400 and 2800 m for R. roxellana (Yongcheng and Richardson 2008), 1720 and 3190 m for R. strykeri (Geissmann et al. 2012), and 3000 and 4700 m for R. bieti (Bleisch and Richardson 2008). Furthermore, we also excluded all areas with tree cover below 50%, all areas with human population density above 100 and/or all areas with values above 20 on the human influence index (HII) (see next section for explanation of HII). Afterwards, to reduce spatial bias, we used Occurrence Thinner version 1.04 (Verbruggen 2012; Verbruggen et al. 2013) to thin the points, so we had 142 occurrence points within the IUCN ranges. In addition, historical records for historically extirpated populations of Rhinopithecus in China were acquired from Li et al. (2002). By digitalizing maps from Li et al. (2002), we derived 96 approximate locations for historical records from 1616 to 1949, including 70 locations outside the current range of the genus (Fig. 1).
The range data for all Rhinopithecus species were combined in most of our modelling (i.e., modelling Rhinopithecus as a single taxon), mainly as the historical records could not be identified into species level, but also to overcome niche truncation. All Rhinopithecus species, with the exception of R. avunculus, occur in subtropical to temperate forest (see Supporting Information Appendix S1 for map of ecoregions/vegetation zones) and hence may be hypothesized to have similar ecological requirements. Rhinopithecus bieti, R. strykeri, R. brelichi and R. roxellana mainly occur in mixed deciduous and evergreen broadleaf forest, with R. bieti, R. strykeri, and R. roxellana also occuring in coniferous forest, and R. avunculus most deviant, being found in tropical evergreen forest (Bleisch et al. 2008; Bleisch and Richardson 2008; Xuan Canh et al. 2008; Yongcheng and Richardson 2008; Geissmann et al. 2012). Furthermore, Rhinopithecus has a high dietary plasticity and occur in areas that have large variations in climate between summer and winter (Long et al. 1994; Yiming 2006; Guo et al. 2007; Xiang et al. 2007b, 2012; Grueter et al. 2009a; Wong et al. 2013). Additionally, we also know that at least some of the species had a more wider distributions as late as within the last 400 years (Li et al. 2002), also in areas that are more climatic different than possibly unoccupied areas within the IUCN polygons. Moreover, as stated in the introduction, some threatened species might be refugee species living in suboptimal habitats (Kerley et al. 2012; Cromsigt et al. 2012) and the current range might not show their full climatic suitable range.
The study area includes a part of East Asia, including areas in China, Myanmar and Vietnam where Rhinopithecus are known to occur (Fig. 1). The study area was delimited by drawing a rectangle around a 500-km buffer outside the species range maps and historical occurrence points.
Environmental and anthropogenic data
Multiple models were calibrated to identify which climate variables are most strongly associated with the distribution of the genus. Initially, eight climatic variables from the WorldClim database (Hijmans et al. 2005) were considered; (1) annual mean temperature (AMT), (2) mean temperature of warmest quarter (MTWQ), (3) mean temperature of coldest quarter (MTCQ), (4) minimum temperature of coldest month (MinTCM), (5) annual precipitation (PANN), (6) precipitation of wettest quarter (PWetQ), (7) precipitation of driest quarter (PDryQ), (8) precipitation of coldest quarter (PColdQ). The four temperature variables, PANN, PDryQ and PWetQ were chosen due to their effect on vegetation and thereby habitat. PColdQ was chosen because the monkeys, except R. avunculus and to some degree R. brelichi, mainly live in high elevations and precipitation during the coldest quarter will extensively fall as snow and possible limit the food availability.
The historical records for extirpated populations of Rhinopithecus were mainly located in Central East (CE) and Southeast (SE) China. We note that the temperature has varied within the period of the historical records (1616–1949) and also subsequently up to the present-day. Generally, the climate was colder in the period of the historical records, with a maximum temperature difference between the coldest period (around 1660 and again around 1840) and current temperature of 1.8 °C for CE China and 1.2 °C for SE China on a decadal time series, and 0.7 °C for CE China and 0.6 °C for SE China on a centennial time series (Ge et al. 2013). To investigate the possible effect of these temperature shifts and, specifically, the colder period during the time of the historical records, we performed a temperature sensitivity analysis by decreasing the current temperature with different magnitudes, subtracting 0.7, 1.5 and 2.0 °C, respectively, from each of the current temperature variables before calibrating the distribution models. The down-adjusted climate variables were used to assess the sensitivity of the models to the different climate and compare the results between models with different adjusted climate data.
Furthermore, we used topographic data from Shuttle Radar Topography Mission (SRTM) (Jarvis et al. 2008) for elevation (ELEV), and computed slope (Slope), standard deviation (STD), and topographic roughness (slope of slope) (TR). Data for protected areas in China were derived from World Database on Protected Areas (IUCN and UNEP-WCMC 2014) and to capture tree cover we used the Moderate Resolution Imaging Spectroradiometer (MODIS) Vegetation Continuous Fields from 2010 (DiMiceli et al. 2011).
We used two different measures of anthropogenic pressure to refine our current distribution data; human population density for 2010 (CIESIN 2015) and the Human Influence Index (HII) (WCS and CIESIN 2005). HII is an index going from 0 (no impact) to 64 (maximum impact) that combines data for population density with data for human land use and accessibility (roads, railroads, navigable rivers and coastlines) and can be used to describe anthropogenic impacts on the environment. The refining was done by excluding all areas within the IUCN range maps with human population density above 100 and/or all areas with values above 20 on the HII.
All data were projected to the Albers Equal Area Conic projection, and converted to their mean values for 5 km × 5 km grid cells. ArcGIS 10.2 (ESRI, Redlands, CA) was used for all GIS operations.
Maximum entropy modelling/distribution modelling
One of the modelling method used was Maximum entropy modelling (Maxent version 3.3.3k (http://www.cs.princeton.edu/~schapire/maxent/), which is a machine learning method for mapping habitat suitability or estimate the potential distribution of a species (Phillips et al. 2006; Phillips and Dudík 2008; Elith et al. 2011). Maximum entropy modelling is among the best-performing methods for species distribution modelling and frequently outperforms traditional statistical approaches and other species distribution modelling methods (Elith et al. 2006; Phillips et al. 2006).
Pairwise Pearson’s correlation coefficients (r) for all variables over the entire study area were calculated (see Supporting Information Appendix S2 for r values) to quantify collinearity. Although Maxent is relatively robust against collinear variables, collinearity can impair the estimation of the influence of individual variables on the model. Among the climatic variables AMT was highly correlated with MinTCM (r = 0.98), MTWQ (r = 0.92), and MTCQ (r = 0.96). AMT contributed less to the predictive power than MinTCM and MTCQ, and was therefore not included in the modelling. Furthermore, MinTCM and MTCQ were also highly correlated (r = 0.98) and were consequently only included in mutually exclusive models. MTWQ was also correlated with MTCQ (r = 0.78) and MinTCM (r = 0.85). MTWQ did not explain much of the variance, but provided a small increase in predictive power and was therefore kept in four of the final models (Table 1). PWetQ was strongly correlated with PANN (r = 0.94), with PANN contributing more to predictive power than PWetQ; therefore PWetQ was removed. PDryQ and PColdQ were also correlated (r = 0.98) and contributed almost equally to the predictive power. PDryQ indicate the minimum amount of precipitation for a quarter and can be more of an indirect stress factor for vegetation than PColdQ. Therefore, PColdQ was consequently removed.
Out of the four topographic variables, only ELEV contributed a little to the overall predictive power, with the rest having no influence at all. ELEV, STD, Slope and TR were consequently left out of the final models. Both measure for anthropogenic pressure, human population density for 2010 and HII were also left out of our final models, so our final models only model climatic suitability (see Supporting Information Appendix S3, S4 and result section for further).
The final variables in the models were PANN, PDryQ, MTWQ, and MinTCM or MTCQ. This resulted in a total of four models, which were run with current distribution data derived from the IUCN ranges and also with both current distribution data and historical records from 1616 to 1949 (Table 1).
Model tuning and evaluation
We used area under the curve (AUC) of the receiver operating characteristics (ROC) curve to estimate the predictive power of our models. The maximum AUC value is 1, achieved by perfect discrimination between occupied and non-occupied cells, while a model with no better predictive ability than random choice will result in an AUC value of 0.5. In practice, models with an AUC above 0.75 are considered potentially useful (Phillips and Dudík 2008). We note that for presence-only data, like in the present study, the highest achievable AUC is < 1 (Phillips et al. 2006).
AUC values may not always be the optimal method to evaluate model performance, e.g. as AUC weighs omission and commission errors equally, and the geographical extent to which models are carried out in can highly influences the AUC values (Lobo et al. 2008). Therefore, the models were also evaluated using the true skill statistic (TSS), which in contrast to kappa is independent of prevalence (Allouche et al. 2006). TSS uses the same scale as kappa and has values between 0 and 1, where 0–0.4 = poor, 0.4–0.5 = fair, 0.5–0.7 = good. 0.7–0.85 = very good, 0.85–0.9 = excellent and 0.9–1 = perfect.
In addition to TSS and AUC, we used the R package ENMeval version 0.1.1 (Muscarella et al. 2014) to tune the features and regularization multiplier settings in Maxent, and ENMTools version 1.4.4 (Warren et al. 2008, 2010) to calculate the sample size corrected Akaike Information Criterion (AICC) and Akaike weights value for our models. AICC has been showed to outperform AUC-based methods for model selection in many cases (Warren and Seifert 2010). The final models were run using default settings, except the regularization multiplier value, which was set to 2.5, maximum iterations was changed to 5,000 and all features were used. To derive suitability and predictive maps the final models were run 10 times with cross validate as replicated run type.
Suitable habitat and potential distribution
Our predictive presence-absence map was derived using the 10th percentile training presence thresholds, which selects the value above which 90% of the training samples are correctly classified. Selection of the best thresholds can be difficult and depends on the sample size and the purpose of the study (Pearson et al. 2004; Liu et al. 2005; Freeman and Moisen 2008; Bean et al. 2012), but in general maximum training sensitivity and specificity, and equal training and specificity perform best (Jiménez-Valverde and Lobo 2007; Liu et al. 2013; Cao et al. 2013). The above thresholds are more conservative thresholds than the minimum training presence threshold, which correctly predicts every training sample (See Supporting Information Appendix S5 for a comparison of thresholds effect on our predictive presence-absence map). As an alternative, which is better protected against overfitting, we also generated a rectilinear bioclimatic envelope model, defined as areas within minimum and maximum values of the five climatic variables, PANN, PDryQ, MinTCM, MTCQ and MTWQ, using only current distribution data derived from IUCN ranges and using both current distribution data and historical records, in ArcGIS. We did this for all species combined and also for each species separately. The latter was done to compare the species- and genus-level results, mainly for checking their consistency. Furthermore, the models with current and historical distribution data were modelled using both current climate data and current climate data adjusted with − 0.7, − 1.5, and − 2.0, respectively, to provide estimates accounting for the cooler climate during the period that the historical records represent.
To assess how much of the climatic suitable area includes tree cover above a certain percentage and were within protected areas, the outputs from the rectilinear bioclimatic envelope models were overlaid with tree cover, derived from Moderate Resolution Imaging Spectroradiometer (MODIS) Vegetation Continuous Fields from 2010 (DiMiceli et al. 2011), and protected areas in China (IUCN and UNEP-WCMC 2014).