For the current climate (1950–2000) we used the WorldClim global climate data set on 2.5 arcminute resolution (Hijmans et al. 2005). The dataset provides interpolated climate layers for 19 bioclimatic variables based on historical data. These variables represent patterns found in monthly weather station data, e.g. annual temperature and precipitation extremes, seasonality and means.
We used five GCMs from the IPCC’s 5th assessment report (Stocker et al. 2013) to obtain future climate data (GFDL-ESM2M, HadGEM2-ES, IPSL-CM5A-LR, MIROC-ESM-CHEM, and NorESM1-M). These GCMs are representative of projected changes of global mean temperature and precipitation (Warszawski et al. 2014). We downscaled the outputs of the GCMs using the delta method (Ramirez and Jarvis 2010) and computed the difference between model outputs for current conditions and the mean for the 2040–2069 time-slice. We smoothed the resulting layers to 2.5 arcminute resolution and applied them to the WorldClim layers for current climate. The result was a high-resolution surface corrected for bias for the current climate and the 2050 time-slice for the 19 bioclimatic variables.
Present occurrence data
Present occurrence location data identify climates currently suitable to produce coffee. We derived the occurrence points from three sources: (i) Geo-referenced coffee farms; (ii) geo-referenced municipalities in Brazil that produce coffee; and (iii) geo-referenced coffee-growing areas identified from Google Earth where data sources (i) or (ii) were not available.
Most occurrence points came from a global database of 62,000 geo-referenced individual farms with predominantly C. arabica and some C. canephora. The International Center for Tropical Agriculture (CIAT) developed the database during several regional projects that were conducted in collaboration with coffee cooperatives and cooperating research organizations.
A comprehensive set of occurrence records in all coffee-producing regions is desirable so that all suitable climates are represented in the database (Elith et al. 2011). We supplemented the geo-referenced data by generating additional occurrence points using publicly-available information about the distribution of coffee production. We used satellite imagery to identify precise locations based on this information.
Unlike the C. canephora data, data of the C. arabica locations were not collected for modeling so that they were highly clustered in the project regions. We stratified the database to avoid bias using a principal-component analysis on the 19 bioclimatic variables to identify typical climates. From each climate cluster we chose a random representative sample. This reduced the original sample to 1772 unique presence locations for C. arabica.
Neither the Arabica nor the Robusta database included all of the dominant growing regions in Brazil, where 36 % of global Arabica coffee is produced (USDA 2012). To ensure sufficient representation of Brazilian sites and climates, we included data provided by IBGE (2012). Using these data, we identified municipalities where 75 % of the coffee is from one or other of the two species. We then geo-referenced these municipalities for the appropriate species.
The combined geo-reference dataset gave 2861 unique pixel cells for C. arabica in 26 countries that together accounted for 92 % of global Arabica output 1998–2002 (USDA 2012). For C. canephora the dataset included 364 unique pixel cells in 11 countries that together account for 92 % of global Robusta output 1998–2002 (USDA 2012) (Supplementary Material Table S1). Figure 1 shows the distribution of present coffee locations and major production regions.
To fit a function that describes suitable climates, the classification algorithms compare the variable patterns found at present occurrence locations with the pattern found in environments that are potentially suitable. To characterize these environments, we took random samples from locations that were not known present locations.
We chose the background samples to avoid both trivial classification and overtraining of the algorithms. In ecology, there is a trade-off between predictive performance and capability to generalize. For example, a model that always correctly separates known occurrence locations from the random background samples may be undesirable. This is because it underestimates the true environmental range in cases where the known occurrence data incompletely represent the true distribution. A more general model, however, that always correctly predicts unknown present locations may overestimate the environmental range.
No optimization framework for the definition of background parameters and modeling approaches exists to date (Elith and Graham 2009). Therefore, rather than using a single sampling strategy we used a model ensemble. We based the ensemble on several background sampling parameters within reasonable ranges for (i) the geographical extent from which the background sample was drawn, and (ii) the number of samples. Furthermore, we accounted for remaining sampling bias in the location database using the biased-background sampling method Dudík et al. (2005)) (Supplementary Material Table S1).
The geographic extent of background samples should reflect prior knowledge of the species distribution and be adequate to the geographical scale of the study (VanDerWal et al. 2009). We employed three different background concepts, political, biophysical, and geographic. We defined the first background as all countries that produce either Robusta or Arabica (USDA 2012; ICO 2013) respectively. We defined the second by limiting the environment to the observed spread of annual mean temperature for each species location sample (C. arabica: 14 °C–26.4 °C; C. canephora 19.2 °C–27.8 °C). We defined the third by using a 4.5° buffer around present locations (about 500 km at the equator).
The literature agrees that the ratio of background samples to occurrence locations should be at least 1:1. Too few background samples do not allow for a clear distinction between occurrence and background, commonly leading to an over-prediction of distribution, while too many background samples result in under-prediction (Barbet-Massin et al. 2012). We used occurrence location to background sample ratios of 1:1, 2:1, 4:1, 6:1, 8:1.
For the climate suitability mapping we relied on the classification probabilities provided by three machine-learning algorithms: MaxEnt, Support Vector Machines (SVM) and Random Forest. MaxEnt (Phillips et al. 2006) is widely used to model species distribution in ecology (Merow et al. 2013). SVM is a widely used classification algorithm; we used the implementation in the R package “kernlab” (Karatzoglou et al. 2006). Random Forests (Breiman 2001) is an ensemble learning method for classification of data using multiple decision trees that has been shown to be useful in ecology (Prasad et al. 2006).
Machine learning algorithms include a regularization parameter that allows the user to adjust a trade-off between optimal model fit and generalization. Optimal parameter values are usually dependent on the characteristics of the input data. We therefore initially defined relevant parameter values by conducting a grid search across the relevant parameter ranges. To assess generalization capacity we selected 25 % of our occurrence points that were most distant from other points as a test data set, and trained on the 75 % of present locations that were not as dispersed. We chose three levels of regularization per algorithm that improved model generalization compared to default settings. For the MaxEnt regularization parameter β we choose 0.01, 5 and 20: for SVM’s c-cost parameter 1, 0.5 and 0.05; and for the number of variables picked at nodes by Random Forest 8, 4, and 2. The first value was meant to produce a well-fitted model, while the last value gave a general model.
To assess the performance of the individual models we used two measures: the threshold independent area under the receiver characteristic curve (AUC); and a calibrated AUC measure (cAUC). AUCs were calculated using 10-fold subsampling of training and testing data. Each model was thus trained on 90 % of the location database and evaluated on the remaining 10 % in ten replications.
The AUC is the standard method of model evaluation in modeling predictive distributions. It summarizes the ranking of occurrence points versus the ranking of background samples. If all present sites have a higher value than background sites its value is 1, while a value of 0.5 reflects a model that is no better than chance. The use of the AUC statistic has been criticized, however, as being misleading when different background samples are drawn from different background extents: low predictions on geographically distant locations are often trivial and inflate the statistic (Lobo et al. 2008). This is to be expected because climate patterns are usually auto-correlated. We therefore calculated a cAUC as proposed by Hijmans (2012) by calibrating the model AUC using the AUC derived from a trivial null model based on the inverse distance to the training presence.
We estimated variable importance by computing AUC values for each predictor variable individually using the Caret package in R (Kuhn 2008). This method applies cutoffs to the predictor data and then calculates sensitivity and specificity for each cutoff to calculate the AUC. The AUC is then a measure for variable importance.
We trained the three algorithms using the parameter spaces described above, five different ratios of background to presence samples, three regularization choices, and the three sampling extents. We therefore trained 3*5*3*3 = 135 distinct models per species. We extrapolated the trained and tested models on raster data for the 19 bioclimatic variables from WorldClim and for the 2050 time-slice. This yielded maps of continuous scores whether a pixel cell belonged to the absence or presence class. This is equivalent to rating each global pixel cell’s climate as suitable or unsuitable for coffee production. We normalized individual model outputs to scores from 0 to 1 and averaged them for each baseline and emission scenario. To define a threshold between probabilities that represent marginal suitability and relevant suitability values we chose the lowest value at a present location. We only included in the analysis pixel cells that had suitability values above this threshold.
We compared impacts across latitude and altitude classes by comparing the sums of suitability scores across 1° latitude classes and 100 m altitude classes. We analyzed regional impacts for 12 regions of coffee production (Fig. 1).
We used the GLC2000 global land cover database (European Commission 2003) to partition suitability changes to land with forest cover (GLC2000 global categories 1–9), land without forest cover, and agricultural land (GLC200 global categories 10–18). Tropical forests provide diverse ecosystem services, are more species rich and hold higher carbon stocks than coffee plantations (De Beenhouver et al. 2013). Coffee plantations, however, often have more biological diversity than other agricultural land (Moguel and Toledo 1999) and hold more carbon stocks (van Rikxoort et al. 2014). Therefore, conversion from natural forest to coffee would have a negative environmental impact, but conversion from open land to coffee plantations could have a positive effect.