Among the vast array of more than 7,000 edible plants, over 400 species are considered as major food crops (Ulian et al. 2020), ranging from minimal alterations to their wild phenotype to the development of distinct cultivars or cultigens with specific features advantageous to humans. In fact, human communities have actively managed and selected plants since the early Holocene (Clement et al. 2010; Watling et al. 2018), favoring traits that enhance edibility, productivity, nutritional value, or other desirable characteristics. While comprehensive syntheses exist for the geographical origins of selected crops such as maize (Wang et al. 2017), tobacco (Duke et al. 2021), or rice (Sweeney and McCouch 2007; Gutaker et al. 2020), the geographic history of many other important crops remains elusive.

One such crop with a rich history is cacao (Theobroma cacao L., Malvaceae), which has been cultivated and utilized by Meso-American societies centuries before Columbus arrival in 1492, primarily for its use in a bitter drink (Cuatrecasas 1964; Bletter and Daly 2006). Today, cacao is cultivated for its fermented seeds, which are essential for chocolate production. Cacao, known for its shade tolerance, thrives in various agroforestry scenarios, whether it’s under thinned forests typical of areas in the Brazilian Atlantic coast (cabruca system), temporary shading amidst food crops, or in the presence of introduced tree species for definitive shading (Sambuishi et al. 2012; Gama-Rodrigues et al. 2021). The amalgamation of cacao with both woody (e.g., Erythrina spp., Hevea spp.) and non-woody species (e.g., banana, cassava) exemplifies the compatibility and sustainability of multistrata production systems (Gama-Rodrigues et al. 2021). Cacao-based agroforestry systems are pivotal for sustainable development in emerging countries, especially in South America and Mesoamerica (Zequeira-Larios et al. 2021). These systems emulate the attributes of natural forests and mitigate human pressure on the original forest cover in cacao regions.

Beyond plantations, cacao, along with other related species, thrives in lowland rainforests of the Americas. Traditionally assigned to two related genera, Theobroma L. (22-23 species, Cuatrecasas 1964) and Herrania Goudot (17 species; Schultes 1958), these understory trees bear fruits that are typically known as “cacao,” “cupuí,” “cacaorana” or similar, with cacao (T. cacao) being the most widely recognized species. It should be noted that most of these species are primarily native to the Neotropical region (Colli-Silva et al. 2023a). Very few species, however, are cultivated or extensively used by humans, whose which include e.g. T. grandiflorum (Willd. ex Spreng.) K. Schum. and T. bicolor Humb & Bonpl. (Cuatrecasas 1964; Bletter and Daly 2006).

The history of cacao is more complex than previously assumed. The species would have been originated around ten million years ago (Richardson et al. 2015), but the role humans played in establishing its current broad distribution across the Tropics is not entirely clear. Genomic studies have revealed that cacao’s domestication involved the introduction of Ecuadorian varieties into Mesoamerica, likely facilitated by indigenous populations (Cornejo et al. 2018). Furthermore, archaeological findings provided insights into the consumption of cacao in present-day Ecuador over 5,000 years ago (Zarillo et al. 2018), emphasizing the intricate history of cacao’s cultivation and utilization in the Americas and underscoring the role played by indigenous societies in its dispersion and consumption in areas where it is found growing today, even in a seemingly “wild” condition. Historical evidence suggests that cacao was introduced into the South American Atlantic coastal forests in the eighteenth century, from where it spread to West Africa during the period of European colonization (Soria 1970; Motamayor et al. 2003).

Exploring the depths of Amazonian primary forests reveals a notable presence of cacao plants within areas exhibiting diverse levels of human impact, encompassing abandoned farms, degraded lands, and seemingly untouched dense forests. This distribution pattern implies that the historical native range of cacao might have been more restricted in the past due to its presumably limited natural dispersal abilities (as described e.g., in Cuatrecasas 1964). Human intervention has played a pivotal role in the introduction, selection, and hybridization of cacao populations, contributing to the development of present-day cultivars (Cornejo et al. 2018), akin to the processes observed in other crops in the Amazon (see Clement et al. 2015; Levis et al. 2017). Consequently, cacao trees demonstrate adaptability across a spectrum of environments ranging from anthropized areas to primary or secondary forests.

Therefore, investigating the influence of human activity on the geographic distribution of cacao not only enriches our comprehension of its original habitat but also holds implications for discussing genetic resources, enhancing crop development, conservation efforts, and discussions concerning the retention or retrieval of genetic data.

The objective of this study is to assess the impact of human influence on the distribution of cacao by comparing areas identified as native ranges with introduced areas. To achieve this, we compiled a comprehensive occurrence dataset by evaluating preserved specimen collections to better allocate the native ranges and introduced areas according to several criteria. Remote sensing images were obtained for locations where cacao specimens are found, and land use profiles were compared between introduced areas and the hypothesized center of origin of the species. We aim to provide insights into the role of human influence in the current distribution of cacao and discuss its potential implications for various aspects, including jurisdiction, access to genetic resources, conservation, and repatriation of genetic data. By doing so, we intend to contribute to both policymaking and academia, offering valuable information and novel perspectives on how cacao’s geographic distribution should be interpreted.

Material and methods

Literature survey and study area

A comprehensive literature survey was conducted encompassing studies that discuss the origin, distribution, and dispersal of cacao before and after human influence. This survey included classic botanical monographs of Theobroma by Bernoulli (1869), Schumann (1886), and Ducke (1925, 1940), as well as the most recent taxonomic treatment available for the genus by Cuatrecasas (1964). In addition, agronomic and historic literature was consulted to understand the association between known cultivars and the botanical circumscriptions of the species, which helped with formulating hypotheses regarding the origin and dispersal of cacao. Relevant works consulted in this regard included Morris (1882), Preuss (1901), van Hall (1914), Cheesman (1927, 1929, 1932, 1944), Pittier (1924), Pittier and Chevalier (1925), Pittier et al. (1926), Pound (1938, 1945), Ciferri and Ciferri (1957), Schultes (1984), Figueira et al. (1994), and Bartley (2005). Furthermore, studies that employed genomic data to delimit the origin and distribution of cacao were also reviewed. These studies, consulted for their insights, included Laurent et al. (1994), N’Goran et al. (1994), Motamayor et al. (2002; 2008), Motamayor and Lanaud (2002), Thomas et al. (2012), Clement et al. (2015), Lachenaud and Motamayor (2017), Osorio-Guarin et al. (2017), Cornejo et al. (2018), Zarillo et al. (2018), and Fouet et al. (2022).

This survey was important because it allowed us to achieve a more accurate determination of the specific regions where T. cacao occurs as wild, and it served as the foundation for defining major and minor regions of interest (Fig. 1) that were used for downstream analyses. The major areas were categorized as follows: (1) Areas of late introduction, where human introduction after the Pre-Columbian era (i.e., before Christopher Columbus arrived in 1492 in the Americas) is well-documented in the literature; (2) Potential early introduction sites, where cacao may have been introduced during the Pre-Columbian era; (3) Areas of early introduction, where human introduction during the Pre-Columbian era is certain based on the literature; and (4) Potential native area of T. cacao based on the most recent evidence compiled in here (namely Bartley 2005; Thomas et al. 2012; Clement et al. 2015; Cornejo et al. 2018; Fouet et al. 2022). To achieve a more detailed resolution for Tropical Americas, we further subdivided the region into smaller scales using the biogeographical delimitations proposed by Morrone (2014) (Fig. 1).

Fig. 1
figure 1

Summary of the main regions considered in this study and the presumed scenarios of the origin of T. cacao and its dispersal through the Neotropics, as proposed by various authors (detailed in Methods, “Literature survey and study area” section). Dispersal events are represented by purple arrows. In Scenario 4 (Cuatrecasas 1964), the red lines indicate the emergence of the Panama isthmus that facilitated overland dispersal of terrestrial organisms. In Scenario 5 (Cuatrecasas 1964), the yellow “!” stars represent the occurrence of several mutations that, according to these hypotheses, would have originated in various cacao morphotypes. Minor areas (sensu Morrone 2014) considered in this study were grouped into major regions based on the origin of the specimens, as described in Methods (“Literature survey and study area” section). The major regions include: (1) Unequivocal late introduction: Antilles (A), Canada and United States (B), Europe (C), Africa (D), Asia (E), Oceania and Pacific islands (F), South American Atlantic coast (G). (2) Potential early introduction: Pará province (H), Xingu-Tapajós province (I), Madeira province (J), Rondônia province (K), Imerí province (L), Roraima province (M), Guianan Lowlands province (N), Pantepui province (O), Paramo province (P). (3) Unequivocal early introduction: Pacific dominion (Q), Magdalena province (R), Puntarenas-Chiriquí province (S), Mesoamerican dominion (T). (4) Potential native area: Napo province (U), Ucayali province (V)

Cacao relies on mammals, like rodents and primates, and birds for dispersal in nature (Cuatrecasas 1964; Silva et al. 2010), and it has dispersion limitations due to fruit characteristics, which have affected its distribution. In this sense, it is important to make clear that the term “introduced specimen” used in this study refers to any specimen that was intentionally introduced by humans at a specific time and location, whereas the term “native area” refers to the region where only wild specimens were naturally dispersed, without human intervention.

Specimen occurrence data

The primary occurrence dataset for T. cacao used in this study was compiled as part of a larger dataset for all Theobroma and Herrania species (Colli-Silva et al. 2023a). This dataset was constructed through an extensive literature survey and incorporated data obtained from the GBIF repository (Global Biodiversity Information Facility; 2020). GBIF-mobilized data underwent rigorous review processes, including georeferencing procedures and thorough taxonomic revision of nearly nine thousand preserved specimen collections of Theobroma and Herrania species (see Colli-Silva et al. 2023a).

For this study, only preserved specimens of T. cacao were extracted from the larger dataset, and we excluded records with the same geographic location. This decision was made because preserved specimen collections provide more reliable and accurate geographic data compared to human observations, photographs, or other sources of information. The occurrence information derived from these preserved specimens has been included as Supplementary Information and can be found in Appendix S1. This data set incorporated a total of 637 locations (unique geographic point occurrences).

Acquisition of remote sensing data

All downstream analyses were conducted in R v. 4.2.1 and Python v. 3.10.2 environments (van Rossum and Drake 1995; R Core Team 2021). Satellite images were obtained iteratively from the Sentinel-2 collection within the Google Earth Engine API platform (Gorelick et al. 2017) in Python. The search was restricted to images captured between January 1, 2020, and January 1, 2022. Briefly, Sentinel-2 images provide multispectral surface reflectance data with bands in the visible and near-infrared regions of the electromagnetic spectrum at a resolution of 10 m.

For each of the 637 locations, the most suitable image for the area of interest was selected, ensuring that the chosen images had cloud coverage of less than 10%. The selected images were then reprojected to the Universal Transverse Mercator zone, and the Normalized Difference Vegetation Index (NDVI) was calculated for a buffer area standardized as 500 × 500 m2 centered around each point location. The NDVI was selected because of its wide and intuitive usage in vegetation analyses (Rouse et al. 1974) which we consider appropriate for this exploratory analysis. In short, NDVI is computed as the ratio of the difference to the sum of reflectance values in the red and near-infrared regions (Rouse et al. 1974). NDVI values range from –1 to 1 and are used to classify vegetations into categories, distinguishing dense forests from sparse vegetations, grasslands, water bodies, barren lands, and built-up areas.

NDVI values were reclassified into the following categories, based on an classification based on NDVI threshold values, defined as folows: (1) water bodies (NDVI < 0); (2) barren lands and built-up areas (0 ≤ NDVI < 0.18); (3) grasslands and agricultural lands (0.18 ≤ NDVI < 0.27); (4) sparse vegetations (0.27 ≤ NDVI < 0.36); and (5) dense forests (NDVI ≥ 0.36). In this study, “areas of human influence” refer to regions showing indications of anthropogenic presence or influence, such as roads, deforested areas, agricultural lands, or other areas that exhibit signatures detectable through standard remote sensing analyses. Areas of human influence were standardized based on NDVI values between 0 and 0.27.

NDVI profile analyses

NDVI profiles were generated for each specific point location (for all 637 locations) and then contrasted across the various areas delineated in “Literature survey and study area” section. Several key variables were collected for each location, including: (i) the median NDVI value of the site, (ii) the proportional occurrence of dense vegetations, and (iii) the proportional occurrence of areas influenced by human activity. This analysis involved buffer extraction and spatial data manipulation carried out using R packages “raster” v. 2.0-12 (Hijmans 2023), “sp” v. 1.5-0 (Pebesma and Bivand 2005; Bivand et al. 2013), and “sf” v. 1.0-8 (Pebesma and Bivand 2023). The metrics were calculated from all pixels included inside the buffer taken for each site.

In order to discern statistical differences, we investigated whether the NDVI scores within explicitly designated regions differed from the other major areas outlined in “Literature survey and study area” section. The assessment of variances between these classes employed a Kolmogorov–Smirnov test (Ks-test) with a significance threshold set at 0.05. The statistical evaluation was conducted using the “dgof” v. 1.4 package (Arnold and Emerson 2011) within R.

Addressing potential sampling biases

In theory, all specimen collections are often biased towards areas that are easily accessible and closer to regions of human influence (Oliveira et al. 2016). To ensure that our results account for these potential biases, we specifically addressed two concerns associated with our dataset. By addressing these biases associated with accessibility and anthropic influence, we aimed to enhance the robustness of our findings and provide a more comprehensive understanding of the observed patterns in our empirical data.

First, to address potential sampling bias in our cacao specimens, we conducted additional investigations on related species within the Theobroma/Herrania genera. Using the same dataset of Colli-Silva et al. (2023a), we replicated image acquisition and data analysis, focusing on these wild cacao relatives, excluding T. cacao. This expanded perspective, though still limited, allowed us to discern distribution patterns beyond our primary target species. Supplementary Information (Appendix S1) contains the dataset used for this analysis. Employing a Ks-test, we compared the sampling patterns of these wild cacao relatives with our main dataset. A significant disparity would suggest that cacao collections deviate from the sampling patterns observed in other Theobroma species, thereby negating the influence of sampling bias.

Second, we aimed to address the concern that sampling tends to be concentrated in more accessible areas with higher anthropic influence, while less accessible areas with lower anthropic influence are underrepresented. To account for this potential bias, we conducted a randomization procedure by randomly swapping occurrence points in our dataset 1000 times, while maintaining the geographic range of our original records (defined as a 25 km buffer around all records). For each replicate, we obtained satellite images for the randomized points and calculated the same metrics as with the empirical data. Subsequently, we compared the results of the replicated scenarios with the empirical dataset to assess their statistical similarity. If the replicated scenarios significantly differed from the empirical dataset, it would suggest that factors other than chance influence our cacao collections, which aligns with the objectives of this study. The files containing the replicates are available as Supplementary Information (Appendix S1, S2).


We found distinct patterns when associating cacao geographic distribution and land use profiles, as identified through NDVI classification. Notably, we observed significant differences between areas introduced to cacao cultivation after the pre-Columbian era and those closer to the suggested native cacao region (Table 1). Regions where cacao was introduced post–pre-Columbian era (like the South American Atlantic coast, North America, and other overseas locations), showed a higher occurrence of cacao specimens closer to areas affected by human activities and fewer occurrences near dense forests (Table 1; Fig. 2). Conversely, cacao specimens from regions closer to the suggested native area, as indicated by literature (such as Napo and Ucayali biogeographical provinces), were primarily situated away from human-influenced areas (Table 1; Fig. 2). Regions where early pre-Columbian introduction of cacao is observed, like Eastern Amazonia and Mesoamerica, displayed intermediate values for the measured variables (Table 1; Fig. 2).

Table 1 Differences of selected variables of land use profiles based on point occurrence locations of cacao specimens in the globe
Fig. 2
figure 2figure 2

a Median NDVI (Normalized Difference Vegetation Index) values of the areas where cacao specimens are found, categorized based on the major and minor regions defined in this study (see “Literature survey and study area” section in Methods). b Relative frequency of forested areas in the regions where cacao specimens are found. c Relative frequency of areas of human influence (as defined in “Acquisition of remote sensing data” section. in Methods), according to the major and minor areas defined for this work (see “Literature survey and study area” section. in Methods). Areas are defined as follows: (1) Unequivocal late introduction: Antilles (A), Canada and United States (B), Europe (C), Africa (D), Asia (E), Oceania and Pacific islands (F), South American Atlantic coast (G); (2) Potential early introduction: Pará province (H), Xingu-Tapajós province (I), Madeira province (J), Rondônia province (K), Imerí province (L), Roraima province (M), Guianan Lowlands province (N), Pantepui province (O), Paramo province (P); (3) Unequivocal early introduction: Pacific dominion (Q), Magdalena province (R), Puntarenas-Chiriquí province (S), Mesoamerican dominion (T); (4) Potential native area: Napo province (U), Ucayali province (V)

In our study, we performed two separate analyses to investigate potential sampling biases associated with our cacao specimens (see “Addressing potential sampling biases” section). Firstly, we examined the possibility of sampling bias by considering that other relatives from the same genus might exhibit a similar biased pattern. Secondly, we aimed to address the concern that, by default, botanical sampling usually is concentrated in more accessible areas with higher anthropic influence, while less accessible areas with lower anthropic influence are underrepresented. Our findings indicate that these identified biases do not solely explain the observed distribution patterns for our data. Statistical differences were found in over 98% of the replicates generated for this study, compared to the empirical data (p-values < 0.05; Appendix S2 in Supplementary Information). Moreover, when comparing the cacao dataset with its wild relatives, the median NDVI, frequency of forested areas, and frequency of anthropized areas showed statistical differences in most regions. These results indicate that factors beyond chance or the biases addressed in our study contribute to the observed distribution patterns of cacao specimens. All p-values and occurrence datasets can be found in the Supplementary Information (Appendices S1 and S2).


Human impacts on cacao dispersal

Our results are consistent with the following scenario outlined in Fig. 2: origin of Theobroma cacao with a native range in areas U and V (this has been suggested as the native range based on genetic diversity studies, e.g. Thomas et al. 2012; early introduction into Northern South America and Central America (areas Q, R, S and T); potential early introduction to Eastern Amazonia and the Guiana (areas I-P); late introduction to Eastern Brazil and tropical areas outside of the Americas (areas G and A-F, respectively). This scenario is closest to that outlined in Fig. 1 (Map 6).

We observed a solid presence of cacao specimens in areas strongly influenced by human activities, providing support for the idea that human intervention would have played a significant role in cacao dispersal to various regions. This raises questions about the true native status of cacao in some areas. Bartley (2005) outline possible pathways of cacao’s dispersal in areas in the African and Asian tropics, where cacao was spread from Mesoamerica to the Philippines in the seventeenth century, and to Africa from Amazonian varieties in the nineteenth century. Plantations in these continents were likely established based on very few individuals, and they may exhibit low genetic diversity. In the American Tropics, while cacao occurrence is often associated with human-impacted areas, recent literature identifies the native range of cacao as the primary forests of Western Amazonia (Thomas et al. 2012; Clement et al. 2015; Cornejo et al. 2018; Fouet et al. 2022). This suggests that, compared to other areas potentially influenced by human introduction, these regions harbor a higher abundance of wild cacao specimens within primary forests, far away from human settlements or urban areas. Alternatively, it implies that these findings might be attributed to the actions of indigenous populations who cultivated/stimulated cacao plantations in these regions prior to modern settlements.

Theobroma species rely on mammals, such as rodents and primates, for fruit dispersion (van Hall 1914; Cuatrecasas 1964). Limited dispersal of cacao is evident due to certain species characteristics, including indehiscent fruits, flower self-incompatibility, short pollination distances, or high rates of vegetative propagation (Silva et al. 2010; Thomas et al. 2012; Levis et al. 2017). Additionally, genetic bottlenecks have been observed in introduced populations of cacao in Mesoamerica, and there is a lack of palynological records of the species in Mexico and Eastern Amazonia before the Holocene (Clement et al. 2010; Thomas et al. 2012; Osorio-Guarin et al. 2017; Cornejo et al. 2018). Furthermore, Bartley (2005) suggested that cacao might first have been used for its pulp by indigenous people and this may have aided its dispersion, as they took fruits on their migrations into the forest, ate the pulp and spat out the seeds. A similar trend is observed within a related species, T. grandiflorum or cupuaçu (Colli-Silva et al. 2023b). These factors indicate that wild cacao populations may have faced barriers to expand their geographic distributions over ecological time, which contrasts with the wide distribution of cacao seen today.

Interestingly, most of cacao’s genetic diversity is concentrated in the border areas of northeastern Peru, northern Bolivia, southwestern Colombia, western Brazil (Acre state), and eastern Ecuador (Motamayor et al. 2002; Thomas et al. 2012; Clement et al. 2015; Cornejo et al. 2018). This specific region is recognized as a biogeographical area of endemism (sensu Morrone 2014), delimited by the Ucayali and Napo rivers and by the Andes to the west. These rivers likely played a crucial role in the diversification of various species (Silva and Oren 1996; Hubert et al. 2007; Harvey et al. 2014; Dumont et al. 1990; Kreft et al. 2004; Morrone 2014). Particularly, the change of these river courses has been discussed as a potential factor shaping species diversification in this region (Tuomisto and Ruokolainen 1997). Cacao may have also been affected by this, when considering its natural history before and human influence.

The extent to which ancient Amazonian societies reshaped the region's landscapes remains a topic of intense debate. For instance, Levis et al. (2017) uncovered a significant link between archaeological sites and the occurrence of certain plant species. Their research revealed that domesticated species were five times more likely to dominate in these areas compared to non-domesticated ones. This trend was also consistent across the Amazon basin, with forests surrounding archaeological sites showing higher abundance and diversity of domesticated plant species. These findings underscore the substantial impact of historical plant domestication by Amazonian indigenous groups on the structure of tree communities. Clement (1989) provided a review shedding light on numerous other Amazonian crops, belonging to different botanical families, that might have undergone similar processes. The collective and increasing body of evidence from different disciplines emphasizes the legacy of ancient Amazonian peoples’ influence on the region’s flora, amplifying the significance of historical plant domestication in shaping the Amazonian landscape. Our research aligns with this narrative, offering further insights that contribute to understanding this historical legacy.

Implications for genetic resource repatriation

Our findings shed light on the fact that many areas traditionally considered as the “native” range of cacao may actually consist of introduced populations that were established before or after the pre-Columbian era and that may contain specimens that have spontaneously grown and persisted outside its native range. If this is correct, there can be significant implications for issues related to jurisdiction and access to genetic resources for crop improvement and conservation of cacao. In this sense, it would be crucial to better characterize germplasm accessions that have contributed significantly to our understanding of cacao diversity, as suggested by many authors (Bartley 2005; End et al. 2010; Laliberté 2012; Malhotra and Apshara 2014).

The principle of sovereign rights of a country over the genetic resources of plant species native to its territory is well-established in international law (Correa 1995) and widely recognized by international bodies such as the Food and Agriculture Organization (FAO) and the Convention on Biological Diversity (CDB). According to FAO, national governments have the authority to regulate access to genetic resources, which is subject to national legislation. Hence, each country should possess the right and jurisdiction over the resources native to its territory. Disputes concerning genetic resource rights and patents for Theobroma species have already arisen in some countries, such as those for the Peruvian cultivars “Chuncho” and “Cacao Amazonas Peru” (INDECOPI 2016), or the Brazilian cupuaçu (T. grandiflorum; see Rezende and Ribeiro 2009 and Colli-Silva et al. 2023b). Determining the origins and natural history of T. cacao before and after humans, and understanding its subsequent introductions rely on the biogeographical context summarized here, with different scenarios indicating specific countries as the native range of the species.

Based on our results (Fig. 2a–c) and on the existing evidence in the literature, the native range of cacao would be limited to areas in Ecuador, Colombia, Peru, and perhaps the westernmost part of Brazil (Acre and Western Amazonas states). Consequently, many areas within Brazil, as well as the Guianas, might harbor specimens that are not authentically native but rather cultivated or naturalized specimens in regions significantly impacted by human activities. However, this assertion requires cautious interpretation and should be further evaluated, given evidence from extensive surveys, such as those conducted in the Guianas, which describe populations that contradict this notion (Lachenaud et al. 2004; Lachenaud and Zhang 2008).

Further limitations in our methodology also require attention. Our study utilized a broad range of satellite images covering various regions and collected over a significant timeframe (see “Acquisition of remote sensing data” section of Methods). The choice of images across such diverse dates can significantly affect the NDVI values. For instance, a forest might display higher NDVI values in the wet season and lower values in the dry season. This variation could explain the high standard deviation of NDVI patterns even in nearby areas, especially in highly seasonal ecosystems, as seen in previous research on Amazonia (Silva et al. 2013). However, in our study, this issue is consistent across all images, spreading this bias evenly throughout the datasets. Moreover, limitations linked to the NDVI extend to instrumental factors, including uncertainties in satellite navigation, fluctuations in the satellite's local crossing time, and sensor degradation (Santos and Negri 1997). While potential correlations between human activities and environmental factors could be confounding, Levis et al. (2017) demonstrated that human influence alone explains roughly half of the variation in the abundance of domesticated species in certain regions.

Further research is imperative to accurately trace the origins of cacao and other wild crop species and validate the various biogeographical hypotheses outlined in our study, accouting for both the history before and after human arrival in the Americas. Levis et al. (2017) raised a significant question regarding the association between domesticated species and archaeological sites: Did humans enrich forests with domesticated species, or did they settle near naturally rich forests? Our approach cannot prove causation, but given additional supporting evidence, the former scenario appears more plausible. To facilitate such investigations, several crucial steps should be taken. Firstly, there should be a substantial increase in collecting new germplasm accessions from wild cacao populations in underrepresented areas (Sereno et al. 2006; Zhang et al. 2016). Secondly, exploring the morphological variability of T. cacao is necessary to identify potential characteristics that could define genetic clusters as distinct varieties (Motamayor et al. 2008). Lastly, historical biogeographical studies employing various analyses are essential to trace the origin of Theobroma-related species in South America, particularly in the Amazon basin. Additionally, considering the biogeography of species related to cacao, like endophytic or pathogenic fungi (Hanada et al. 2010), can offer insights into the geographic history of cacao, possibly indicating co-evolution with T. cacao or its relatives.