Introduction

As part of the peace agreement signed in 2016, the Colombian Government resolved to implement a crop replacement program to eradicate coca crops (PNIS 2017; Barrera-Ramírez et al. 2019; Torres Rodríguez et al. 2020). In the search for suitable legal crops, cacao was identified as viable, given its propensity to generate income for small farmers (Castañeda-Álvarez et al. 2016). Despite Colombia’s ideal location and conditions for growing cacao, its yield is low, producing just 0.53 tons per hectare in 2018 (Agronet 2020). Colombia contributes only 1% of the world´s cocoa production compared to the Ivory Coast, the biggest producer, with nearly 40% (Abbott et al. 2018). In order to strengthen the weak sector, initiatives such as “Cacao for Peace” were developed to boost research, technical assistance, and education (USDA 2016).

There are several threats to cacao cultivation in Colombia, including disease, extreme climate variability and deforestation. The Moniliophthora roreri disease, a fungus that causes frosty pod rot disease, has affected cocoas´s quality and production in the highest producing regions in the country and across Latin America (Jaimes et al. 2016). Extreme climate variability also threatens its cultivation, with the cacao more affected by rainfall than any other climatic variable (Omolaja et al. 2009; Mena-Montoya et al. 2020). Cacao trees are susceptible to water deficiency, and in areas that have suffered from drought in Colombia such as the eastern plains of the Casanare Department in 2014, the Guajira peninsula of the Caribbean region in 2015, and the Amazon region in 2005 (IDEAM 1996, 2017) cacao cultivation has waned. Land tenure insecurity, illicit cropping and illegal logging has prompted deforestation at an unprecedented rate in Colombia. According to IDEAM (2018), around 198,000 hectares of forest was cleared in 2018.

Exploring the diversity and distribution of Crop Wild Relatives (CWR) could contribute to develop conservation strategies. Particularly, identifying and preserving important species for its potential use in crop improvement programs (Maxted et al. 2006). CWR are an essential component of crop diversity as they contain unique traits which can provide resilience and potentially generate more productive varieties of domesticated crops (Maxted et al. 2010, 2012; Maxted and Kell 2009; Ford-Lloyd et al. 2011; Phillips et al. 2017; Rahman et al. 2019; Vincent et al. 2013, 2019). Wild species have traits that can enhance resistance to changing climate conditions or attacks of diseases (Nevo and Chen 2010; Dempewolf et al. 2014). In the case of cacao in Colombia, cacao CWR maybe less susceptible to Moniliophthora roreri because they are tolerant of high humidity. There are already adaptations of wild Theobroma cacao growing in flooding habitats in the Amazon region but incorporation of such traits to cultivation is still behind (Wittmann et al. 2013). Vincent et al. (2013) established a list of priority taxa with high importance for global and regional food security and found that within cacao´s gene pool, 3 taxa of Theobroma cacao had high importance for agriculture of smallholders in the tropics. Despite their importance, Castañeda-Álvarez et al. (2016) found that in South America among others, diversity of CWR is poorly represented in gene banks and conclude that a systematic effort is needed to improve the conservation and availability of CWR for use in plant breeding.

Wild cacao´s diversity in the neotropics is made up of 22 species of the genus Theobroma (Cuatrecasas 1964), and 18-20 species of its sister genus Herrania (Schultes 1968). The primary gene pool of cacao is composed of the cultivated Theobroma cacao L. (T. cacao), and its close wild relatives such as T. cacao subsp. cacao and subsp. sphaerocarpum. The tertiary gene pool is composed of other wild species of Theobroma. The most closely related species of T. cacao that have agronomic traits are Theobroma bicolor Humb & Bonpl. and Theobroma grandiflorum (Willd. ex Spreng.) K. Schum. Within the cultivated varieties of T. cacao there are three types of cacao: criollo, forastero and trinitario (Cuatrecasas 1964). Criollo originates from Mesoamerica and Northern South America, while Forastero is highly diverse and composed of multiple genetic groups distributed in the Amazon region (Motamayor et al. 2002). Trinitario is a hybrid of Forastero and Criollo cacao (Motamayor et al. 2003). Colombia has limited access to readily available and digitised datasets of these species occurrences, limiting our ability to extrapolate and understand climate conditions in which the cultivated types can grow in. Furthermore, very little is known about species richness and endemism among cacao CWR in Colombia. Most published case studies report the presence of the species in certain places but a comparison between observed and predicted diversity at various geographic scales has not yet been documented (Baker et al. 1954; Cuatrecasas 1964; Richardson et al. 2015). Colombia is not an isolated case, documentation of other plant groups of wild relatives around the world is limited (Hijmans and Spooner 2001; Jarvis et al. 2003, 2008; Ramirez-Villegas et al. 2010; Freely and Silman 2011; Ng’uni et al. 2019). To improve conservation of cacao´s gene pool, it is essential to not only further develop breeding programs of cultivated varieties (Rodriguez-Medina et al. 2019) but also collect new material from unexplored wild populations (Maxted et al. 2007; Lala et al. 2017; Kell et al. 2017).

Our study uses species distribution modelling (SDM) to predict the distribution of wild cacao. SDM is a predictive tool that can be used to identify biodiversity patterns or the most suitable environmental conditions (Araujo and Guisan 2006; Loiselle et al. 2008; Thuiller et al. 2008; Ramirez-Villegas et al. 2010; Parra-Quijano et al. 2012; Jarvis et al. 2015; Phillips et al. 2016, 2017, 2019; Kindt 2018; Araujo et al. 2019; Vincent et al. 2019; Ratnayake et al. 2020). In particular, SDM can be used to map biodiversity and select new protected areas for conservation (Gaston 1991; Gaston and Rodrigues 2003; Graham et al. 2004; Pollock et al. 2017; Richardson and Whittaker 2010). Although SDM provides insightful results, it is not always reliable. Recent studies have found that in some cases the results of SDM were inaccurate because data availability was limited or did not represent the environments where species distribute or could live (Peterson and Soberon 2012; Araujo et al. 2019). To address these inaccuracies, more collections are needed (Rondinini et al. 2006; Schmidt-Lebuhn et al. 2012).

To date, few studies have applied SDM to the study of wild cacao species. Two examples are Thomas et al. (2012) and Zarrillo et al. (2018). Both predicted species diversity of cacao on a global scale and found high genetic diversity and richness in the upper Amazon areas between Colombia, Ecuador, Peru and Brazil. They predicted that 9 to 11 species of Theobroma existed in the Colombian Amazon based on modelled species diversity.

Our study aims to improve the collection of cacao CWR and our understanding of their spatial biodiversity at national and regional levels. We present our analyses of newly collected cacao CWR data from two expeditions to the upper Amazon and central Chocó regions; map biogeography patterns of species diversity and endemism; apply the latest SDM techniques to map cacao CWR climate suitability; and propose germplasm collections and habitat protection strategies for cacao CWRs in Colombia.

Methods

Expeditions

The centre of origin of cacao is in the Amazon (Vavilov 1939; Hawkes 1999; Motamayor et al. 2002; Patiño 2002; Dos Santos Dias et al. 2003; Chumacero de Schawe et al. 2013; Berlingeri and Crespo 2012). The highest level of species diversity among cacao CWR can be found in the centres of species diversity including the northwestern part of the Amazon in Colombia, Ecuador and Peru where between nine to eleven species were recorded (Thomas et al. 2012).

Between 2018 and 2019, we conducted two detailed surveys in search of cacao CWR for herbarium collections. The main objective of our expeditions was to collect and catalogue the biodiversity of cacao CWR in unexplored regions. Germplasm was excluded from our collections because the purpose was to find regions where cacao CWR grow and explore their environments. In this paper we report the botanical collections only, however we also collected samples of associated biodiversity in two bioregions with different ecosystems: the most southern area of the Serrania de Baudó in central Chocó (Fig. 1a) and the upper Caguán and Caquetá Rivers in the upper Amazon basin (Fig. 1b). Both places are remote, climatically unique and have not been explored with limited biological collections of cacao CWR having taken place there. The Chocó region is known on the world map as a biodiversity hotspot (Myers et al. 2000). It is an abundant source of endemism and diversity due to its unique geological and climatic conditions with extreme records of rainfall (Pérez-Escobar et al. 2019). The upper Caguán and Caquetá Rivers are located in the center of origin of cacao in remote areas that have not been explored since 1950´s due to the armed conflict (Franzoi 2009). The region is a transitional zone between the Amazon basin and the Andean foothills of the eastern cordillera where the climatic zone is different from the surrounding areas (Fig. S1).

Fig. 1
figure 1

Species distribution of cacao CWR in Colombia, a spatial distribution of Theobroma in the western Pacific region, b spatial distribution of Theobroma in the Amazon region, c spatial distribution of Herrania species in the western and northern Pacific region, d spatial distribution of Herrania species in the Amazon region

The expeditions began with an exploratory trip to survey the regions and consult the surrounding communities. We selected potential sampling sites based on accessibility (i.e. proximity to rivers and roads), the condition of the forests and advice provided by the communities living near the sites. Poster-size maps of each site were presented to the communities as well as a field guide of wild cacao species to help identify and locate them (i.e. Field Museum of Chicago format). We relied on the help of the communities who were most familiar with the territory to identify the best sites.

The first expedition to the Amazon took place in July 2018. We travelled by boat on the Caguán and Caquetá rivers in the northernmost tip of the Amazon for approximately 1000 km over 22 days. We collected samples at five sites along the rivers, each with different landscapes (Fig. 2; Fig. S2; Table S1). The second expedition to Chocó took place in March 2019. We travelled for 98 km by car from Quibdó to the La Victoria municipality where we stayed for 15 days. La Victoria is located between the Baudó River on the west and Atrato River on the east, and between two tectonic zones, the Baudó and Atrato faults (Duque-Caro 1990) where the topography is hilly and there are large catchments. On our expeditions, we collected samples on either side of the catchments within a radius of 10 km from La Victoria.

Fig. 2
figure 2

Diversity of wild cacao species collected during the expeditions, a the landscape along the Caguán River in one of the sampling sites, b view of the hilly landscapes in La Victoria Chocó facing the Baudó catchment, c planted variety of Theobroma glaucum cultivated by local farmers in Mecaya along the Caquetá River, d a tree trunk of Theobroma bernouilli sampled in one of the sites in Chocó, e species of the genus Herrania from the Caguán River in the Amazon, f species of the genus Herrania from La Victoria, Chocó, g a wild Theobroma cacao pod from trees growing in a flooded area in the Caguán River, h a wild species of Theobroma cacao in its initial stage of domestication found in Mecaya, Caquetá River, Amazon, Colombia

Species distribution datasets

Two types of data were used in our analysis: fieldwork and historical (Fig. 1). During the expeditions, a fieldwork dataset was compiled consisting of herbarium collections of cacao CWR. Samples of branches were taken from different wild cacao species, and where possible, fruits and flowers were collected. The fieldwork dataset collected on the expeditions was made up of 211 herbarium collections: 174 samples of Theobroma and 37 of Herrania. Within the samples of Theobroma, 63 belongs to wild species of non-domesticated T. cacao that are found only on the riverbanks. All samples were geo-referenced, processed in the field and stored as dry specimens at the herbarium of the Universidad de los Andes (https://bit.ly/39A1qtw). Using QGIS Desktop V. 3.10.2. (2019), we mapped each of the occurrences sampled during the expeditions.

A historical dataset composed of records was used for the analyses (Yockteng et al. 2017). The historical dataset was generated using national and international museum and herbarium samples and was made up of 499 records of geo-referenced wild species: 376 Theobroma (10 species) and 123 Herrania (6 species). We used QGIS Desktop V. 3.10.2. (2019) to check each record for spatial accuracy. As part of the cleaning process, missing or incorrect spatial coordinates were either excluded or corrected where adequate information was available. During the pre-analysis of this data, we found low values of redundancy estimation for Theobroma and Herrania in Colombia (Fig. S3c) in the historical collections, where a ratio of species occurrence and number of samples per grid cell show low representation (0.5/1) with the highest value of 0.8/1 in just five grid cells across Colombia. These inconsistencies suggest an underestimation of observed diversity in the historical collection at the national level likely due to a lack of fieldwork (Fig. S3a–b), which is a common pitfall in the study of biodiversity (Hickisch et al. 2019; Sporbert et al. 2019).

Although cultivated cacao was not considered in our expeditions´ field sampling strategy, in addition to the 211 herbarium collections, we also collected samples of Theobroma cacao being cultivated in an abandoned lot on the south of Remolino del Caguan. They were collected because they were growing near wild cacao populations and were laden with cacao pods (even though it was not harvesting season and no other cacao trees were in season). These four samples of Theobroma cacao were found in a riverbank that remain flooded for at least 3 months of the year and thus potentially adapted to high humidity. They could provide valuable information on traits of cacao that adapt to flooding conditions and could be used to improve plant breeding strategies (Rodriguez-Medina et al. 2019). The samples are planted at La Libertad and Palmira Agrosavia´s experimental stations and we are investigating whether they belong to the Forastero cultivated Theobroma cacao.

Predictor variables

Eight climatic predictor variables were used in the SDM: precipitation, maximum temperature, minimum temperature, mean temperature, diurnal temperature range, relative humidity, wind speed and solar radiation. We chose these variables because they provide high spatial resolution and accurately represent the environments where cacao CWR can be found. These variables were at ~ 1 km2 spatial resolution and extracted from the global Wordlclim version 2 (Fick and Hijmans 2017). Slope and elevation at a spatial resolution of ~ 1 km2 were also included in the SDM (data from DAPA project, CIAT). The hydrography of the expedition site in the Amazon region was mapped using the WaterWorld tool which uses the hydroSHEDS model applied at a spatial resolution of approximately 1 km2 (Lehner et al. 2008; Mulligan 2010, 2013). The online geo-portal of the Colombian Geological Society was used to identify the geological and landforms units of the Amazon expedition region (SGC 2015). Neither hydrography nor geological analysis was conducted for the Choco region given the area was limited to one site covering a radius of 10 km.

Species distribution modelling

We used the historical and the fieldwork datasets to conduct our SDM analyses. We divided the SDM analyses into three components: (a) a national distribution based on the historical dataset; (b) a sub-national distribution based on the fieldwork data and historical dataset covering the Amazon and Pacific region; and (c) a national distribution analysis based on the historical dataset only and then combined with the fieldwork expedition data as a separate sub-group.

To conduct our predicted distribution analyses, ensemble modelling at a scale of ~ 1 km2 was carried out using the BiodiversityR software (Kindt and Coe 2005; Kindt 2018; González-Orozco et al. 2020). Each ensemble map consisted of fitted probability raster layers, based on individual suitability models of wild species, which were combined using a consensus of several models that predict presence. To create each map the following 21 models were used: MAXLIKE, GBM, GBMSTEP, RF, GLM, GLMSTEP, GAM, GAMSTEP, MGCV, MGCVFIX, EARTH, RPART, NNET, FDA, SVM, SVME, BIOCLIM.O, BIOCLIM, DOMAIN, MAHAL, and MAHAL01. The probabilities of each model were weighted so that they could be compared. Total weights were calculated as the average weight of different models (sub-models) of the k ensembles. The weight of each sub-model from the k ensembles were used to rank each model on their importance for predictability. Nine of the best models were selected to generate the ensemble. Following this, a K = 4-fold cross-validation was conducted for the entire 21 models, producing suitability estimates equivalent to maps of predicted distributions suited to Theobroma and Herrania.

Species diversity and endemism

Biodiverse v2.0 software was used to calculate the observed diversity metrics of the historical dataset at a spatial resolution of 100 × 100 km (Laffan et al. 2010). Species richness represents the total number of taxa in each grid cell. Species endemism as a corrected weighted endemism (CWE) derivation of weighted endemism (WE) was calculated. CWE partitions the region of interest into equivalent cells, counts the species present in the cells and divides each by its distribution range (Crisp et al. 2001; Rosauer et al. 2009). Species richness calculated from herbarium datasets are often a poor estimate of real species richness (Schmidt-Lebuhn et al. 2012). Hence, an abundance based Chao1 estimator of species richness was calculated for all cells of Theobroma and Herrania found in the historical dataset. This index is calculated as Chao (1984): [Chao1 = observed species + (species found once)2/(2·species found twice)]. In the case of the fieldwork dataset, the number of taxa per site were counted in order to estimate species richness.

Measuring predicted distribution and diversity-endemism in protected areas

We measured predicted distribution by estimating the percentage of predicted suitability and diversity-endemism in relation to the network of protected areas in Colombia using QGIS Desktop V. 3.10.2 (2019). A map of the polygons of Colombia´s protected areas was overlapped with the grid cells of species richness and endemism (Fig. S4). A percentage of species richness and endemism present in protected areas was then estimated by counting the number of diversity grid cells that overlaped with the protected areas polygons. In order to estimate the percentage of the highest values of predicted suitability for Theobroma and Herrania in protected areas in the Amazon and Pacific regions that were present, we counted the number of grid cells with the top five percent of predicted suitability that overlapped with the polygons in the map of Colombia´s protected areas.

Results and discussion

Taxa collected on expeditions

Herbarium collections gathered during our expedition in the Amazon (Fig. 2a) and Pacific regions (Fig. 2b) represent high diversity of cacao CWR with twenty-two taxa in total (Tables S1–S2) of the genus Theobroma (Fig. 2c, d), Herrania (Fig. 2e, f) and wild species of Theobroma cacao (Fig. 2g). In an isolated case, we came across a farmer in Mecaya who collected some seeds from the local wild species of Theobroma cacao and cultivated them on a small lot of land (Fig. 2h). This was the only case we came across of cacao CWR being cultivated which could indicate its potential for local agriculture. In Choco, we found twelve taxa of Theobroma and Herrania within a radius of 10 km which indicates higher species diversity than what is reported in the historical collections. These high numbers of taxa had never been reported before in a single locality. The field collections in Choco represent nearly half of the 26 species of Theobroma in Colombia reported by Richardson et al. (2015). The herbarium collections of Theobroma and Herrania in the Amazon represent nine taxa. We surveyed away from rivers in the Amazon but only found wild Theobroma cacao near the rivers. We did not find them in Choco as they only grow in the Amazon region.

Species richness and endemism of historical and collected taxa

The historical database found higher species richness of Theobroma (six species) in the Amazon region than the Pacific. However, we found higher species richness of Theobroma in the Pacific (nine species) compared to the Amazon. In the case of Herrania both the historical dataset and our collections reported higher species richness in the Pacific than in the Amazon (Fig. 3a–c). Both datasets found endemism to be higher in the Pacific region than in the Amazon for Theobroma. The historical dataset showed undefined patterns of endemism for Herrania in both regions while our collections found endemism to be high in the Pacific only (Fig. 3b–d).

Fig. 3
figure 3

Spatial patterns of diversity of cacao in Colombia depicted from the historical collections, a–c species richness, b–d endemism, a–b Theobroma, c–d Herrania

Predicted distribution of historical and collected taxa

Wild cacao is known to be of lowland restricted lineage (Richardson et al. 2015). However, we do not know to what extent new lowland environments of similar characteristics outside its current distribution could be potentially suitable for cacao CWR in Colombia. We found that 95% of the grid cells with the highest predicted suitability scores for Theobroma and Herrania in the Choco and Amazon regions were not predicted by the historical dataset. Our SDM results estimate an increase of 50 to 70% of the suitability areas after new field surveyed geo-locations were included in the SDM model (Fig. 4). Both the Amazon and Pacific regions showed high levels of predicted suitability (Fig. 4c, d, f, g) that promotes unique niche conditions and favors biodiversity (Myers et al. 2000; Richardson et al. 2015). We also observed that predicted suitability patterns in the Amazon were mirrored in some parts of the Pacific when applying the distribution range of the Amazon. This could be because some taxa were present before the northwestern Andean mountain range uplifted (Richardson et al. 2018). Our observations point to potentially new habitats for cacao CWR, however there are no records to validate this. Further collections should be undertaken in those areas to verify this observation.

Fig. 4
figure 4

Spatial patterns of predicted distribution for historical and expeditions fieldwork records of cacao CWR in Colombia, a spatial data of Theobroma and Herrania from historical and expeditions collections, b–d predicted suitability of Theobroma, e–g predicted suitability of Herrania, b and e predicted suitability of the historical collections of Theobroma and Herrania, c–d–f and g the predicted suitability of both historical and expedition samples of Theobroma and Herrania, c and f predicted suitability for the Amazon regions, and d and g predicted suitability of Theobroma and Herrania for the Pacific and northern regions

In the Amazon we surveyed along the riverbanks of the Caquetá, Caguán and Putumayo Rivers and found that populations of wild Theobroma cacao were dominant, growing in floodplains and wetlands. Our predicted suitability results reflect such patterns, even in the areas we did not sample (Fig. 5a; Fig. S5–S6). This is an encouraging result -our model was able to identify distributional patterns using data from a small number of areas to provide results in other areas. The results from our SDM model are consistent with previous studies that find that hydrological conditions in the Amazon rivers are strong drivers of the distribution of wild Theobroma cacao (Cheesman 1944; dos Santos Dias et al. 2003; Hans ter Steege et al. 2013). On the other hand, we found most wild Theobroma growing mainly in dry lands and distributing across the hills, gullies, valley bottom and ridges, however we also observed them in some cases cohabiting with wild Theobroma cacao in the wetlands or river banks. Our SDM results reinforced this observation, non-flooded areas had mid to high levels of predicted suitability for wild Theobroma. This suggests that the variation of micro-environments in the Amazon landscape promotes the presence of certain species and not for others (Fig. 2a, b; Fig. S9). Furthermore, the geological conditions can also affect distribution patterns of wild cacao (Fig. S7). In relation to the distribution of Herrania species across the landscape in the Amazon, we found Herrania species growing more often in wet gullies. The areas of predicted suitability of Herrania species showed a greater distribution than Theobroma in the Amazon (Fig. 4f).

Fig. 5
figure 5

Proposed priority sites in the Amazon and Pacific regions of Colombia for future germplasm collections and habitat protection of cacao CWR a–b, for the Amazon region in a, Orinoquia-Amazonia foothills (1–3), San Jose del Guaviare (2), Caguán-upper Caquetá Rivers (4), upper Apaporis (5), Caquetá-Apaporis (6), lower Caquetá River (7). For the western Pacific and northern regions in b, southern Pacific (8), northern Pacific (9), inner west of the northern central valley (10), northern valleys of the central range (11), and western slopes of the north eastern ranges (12). Dashed lines indicate potential connectivity between sites

We found that Theobroma and Herrania grow in very different environments in the Pacific region compared to the Amazon. There were a much higher number of species in the Pacific than in the Amazon. The hyper-humid conditions and rugged topography in Choco produced greater species richness and endemism than in the Amazon (Fig. 5b; Fig. S8) consistent with other studies on tropical regions (Barthlott et al. 2005). According to the SDM, the areas of predicted suitability were found to be large, particularly in the northern part of the Pacific. However, the areas of predicted suitability were greater for wild Theobroma species than for Herrania in the Pacific region (Fig. 4g).

Conservation strategies

There are many conservation strategies that could be used to improve conservation of cacao CWR, including identification of priority taxa, establishment of sites for active in situ conservation and development of ex situ collection plans (Maxted et al. 2008; Maxted and Kell 2009; Maxted et al. 2010; Hunter and Heywood 2010; Maxted et al. 2013; Rubio Teso et al. 2013; Magos Brehm et al. 2017; Dulloo and Thorman 2017; Zair et al. 2018).

We suggest a conservation strategy (Fig. 5) involving further expeditions to gather herbarium collections and germplasm samples and the establishment of  new protected areas. These samples could be used to establish a more complete and representative dataset that accurately map the distributional patterns of cacao CWR and plant breeding programs to improve the cultivation of cacao. Herbarium collections and germplasm samples should be collected from the following regions:

  1. 1.

    upper Apaporis River and the lower Caquetá River (sites 5 and 6 in Fig. 5a) which connect the Colombian Amazon to the upper Japura River in the Amazon basin in Brazil;

  2. 2.

    the lower Amazon area (site 7 in Fig. 5a) which has unique climate conditions compared with other Amazon regions in Colombia;

  3. 3.

    unprotected areas in San Jose del Guaviare region (site 2 in Fig. 5a). This area is also under threat from deforestation and there are limited collections;

  4. 4.

    the western slopes of the eastern ranges (site 12 in Fig. 5b).

Equally important but with greater urgency is the protection of habitat in these Amazon regions. Deforestation and illegal cropping are occurring at an alarming rate, threatening the environment in which cacao CWR grow. We observed both of these threats spreading into the lower Caguán and upper Caquetá Rivers region (site 4 shown in Fig. 5a; see Fig. S2 for a detailed map of the region) (https://bit.ly/2UFptjy; Betts et al. 2017; Armenteras et al. 2019; Bonilla-Mejía and Higuera-Mendieta 2019). These sites should be priority areas for conservation for the following reasons: (a) both sites are home to cacao CWR such as T. glaucum, T. subincanum and several populations of wild Theobroma cacao with traits adapted to their unique environments (Table S1); (b) this region has the largest continuous well-preserved forest in the Caquetá Department and connects the recently expanded Chiribiquete National Park with the Caquetá-Putumayo regions; and (c) this region has unique climatic conditions (Fig. S1) and extensive areas of suitable habitat (Fig. 5a site 4). In addition to the lower Caguán and upper Caquetá Rivers region the Government could consider protecting:

  1. 1.

    the area between the Orinoquia and Amazonian foothills (Fig. 5a sites 1 and 3) which serves as a geographical connection between both areas;

  2. 2.

    the western and northern central ranges lowlands (sites 10 and 11 in Fig. 5b) which serve as a geographical connection between the intern-Andean wet forests and the western Pacific region; and

  3. 3.

    the Pacific region (sites 8 and 9 in Fig. 5b) because they are geographically isolated and contain endemic and unique species such as Theobroma chocoense.

Conclusion

Our study reveals new data on taxa distribution and climate suitability of cacao CWR in unexplored regions of Colombia. We identify several challenges to the conservation of cacao CWR, but most critically the need to address the collection gaps and investigate climatic adaptations (Garcia et al. 2017; Parra-Quijano et al. 2019; Zhang and Batley 2019). Field surveys in our proposed regions should be undertaken to improve our understanding of their spatial patterns. A conservation strategy should include actions to address the real threats to their wild cacao population, employing both in situ conservation strategies such as protecting natural habitat as well as ex situ strategies such as conducting germplasm collections, identifying priority species, and the establishment of cacao CWR seed banks for plant breeding purposes.