On the origin and dispersal of cultivated spinach (Spinacia oleracea L.)

Spinach (Spinacia oleracea L.) is an economically important crop that is cultivated and consumed worldwide. Spinach is interfertile with the wild species S. tetrandra Steven ex M. Bieb. and S. turkestanica Iljin that therefore are presumed to include the most likely crop ancestor. Here we studied variation in 60 Single Nucleotide Polymorphisms (SNP) previously identified in S. oleracea to address the issue of crop ancestry and domestication region. For this purpose we investigated 95 accessions, including 54 spinach landraces from a wide geographic area in Europe and Asia and 16 S. tetrandra and 25 S. turkestanica populations of which the majority had only recently become available. Compared to S. tetrandra substantially higher levels of amplification success and higher levels of variation were detected for S. turkestanica, indicating that S. oleracea is genetically closer to S. turkestanica than to S. tetrandra. Our phylogenetic and population structure analysis supported the conclusion that S. turkestanica is the most likely ancestor of cultivated spinach. In addition, these analyses revealed a group of S. oleracea landraces from Eastern and Southern Asia with a strong genetic resemblance to S. turkestanica. This group includes landraces from Afghanistan and Pakistan, which are part of the native distribution range of S. turkestanica. The domestication of spinach may therefore have occurred more eastwards than generally assumed. Furthermore, our study provides support for the hypothesis that after domestication, spinach was introduced into China via Nepal. Additional collecting of spinach landraces is recommended in order to allow the more precise reconstruction of the crop migration routes.


Introduction
Spinach (Spinacia oleracea L., 2n = 2x = 12) is an economically important leafy vegetable that is consumed worldwide (Morelock and Correll 2008). Although the origin of cultivated spinach is uncertain, it is believed to have been domesticated in the area of present Iran, former Persia, around 2,000 years ago (Rubatzky and Yamaguchi 1997;Morelock and Correll 2008). It is presumed that the crop has spread late in history as no references to spinach from the Greek and Roman cultures have been found (Heine 2018) and because the oldest written records in which spinach is mentioned are from the fourth century AD in Mesopotamia (El Fai‹ z 1995, as cited in Hallavant and Ruas 2014).
How spinach has spread from its center of origin to other geographic areas is largely unresolved (Simoons 1990;Hallavant and Ruas 2014). Distribution of the crop to the West possibly occurred through expansion in Muslim territories (Sneep 1983). Current evidence suggests that spinach was introduced in Europe by the Moors through the Iberian Peninsula. The first written evidence in Europe mentions spinach cultivation in Moorish Spain since the eleventh century (Al-'Awwâm 2000) and the first archaeobotanical evidence is from the Pyrenees mountain range dating back to the late 12th or early thirteenth century (Hallavant and Ruas 2014). The spreading of spinach cultivation into Central and Eastern Asia is even less distinct. The oldest written records report that spinach was introduced into China via Nepal in the seventh century (Laufer 1919). However, it remains unclear how spinach was introduced in Nepal.
The closest wild relatives of cultivated spinach are Spinacia tetrandra Steven ex M. Bieb. and Spinacia turkestanica Iljin., which both are native to areas surrounding the Caspian Sea. S. tetrandra is indigenous to the Transcaucasia and Kurdistan region, and S. turkestanica to Central and Southern Asia (Hassler 2018). S. tetrandra and S. turkestanica are morphologically similar and strongly resemble cultivated spinach. Despite their large similarity, the two wild species differ in inflorescence characteristics. Additionally, S. tetrandra male plants are considerably smaller than females, while this sexual dimorphism is not as pronounced in S. turkestanica (Ribera et al. 2020). It has been suggested that one or both of these two species may have been ancestor of cultivated spinach (Andersen and Torp 2011). However, phylogenetic studies (Fujito et al. 2015;Xu et al. 2017) and research on offspring fertility and characteristics of the sexual chromosomes (Fujito et al. 2015) indicate that the genetic relationship to S. oleracea is closer for S. turkestanica than for S. tetrandra. In both of these studies only a small number of wild populations were included. For S. tetrandra five gene bank accessions were used, including 'Ames 23,664 0 and 'PI 608,712 0 that clustered with S. turkestanica (Xu et al. 2017) and 'PI 647,859 0 , 'PI 647,860 0 and 'PI 647,861 0 that originated from collecting sites close to each other in Georgia (GRIN 2019). Also for S. turkestanica a small set of gene bank accessions from a narrow geographic region were used, including 'PI 608,713 0 and 'PI 647,862 0 that grouped with cultivated spinach in the phylogenetic study of Xu et al. (2017).
Recently, the number of available wild spinach populations has increased, mainly due to two collecting expeditions organized by the Centre for Genetic Resources, the Netherlands (CGN). Currently, a total of 49 S. tetrandra accessions from three different Transcaucasian countries (Armenia, Azerbaijan and Georgia) and 89 S. turkestanica accessions from four Central Asian countries (Kyrgyzstan, Tajikistan, Turkmenistan and Uzbekistan) are available from gene banks (EURISCO 2019;GRIN 2019;van Treuren et al. 2020). Here we readdressed the issue of most likely ancestry of cultivated spinach through analysis of Single Nucleotide Polymorphisms (SNP) using S. oleracea and a wide variety of populations of its wild relatives S. tetrandra and S. turkestanica. As our study included landraces from a wide geographic area we also aimed at specifying the domestication region of spinach.

Study material
A total of 95 Spinacia accessions from the CGN collection were selected, including 16 S. tetrandra, 25 S. turkestanica and 54 S. oleracea accessions from different geographic regions (Online Resource 1). Accession is defined as a gene bank unit of plant material with a distinct identity and may represent different specimen types, including cultivars, landraces and wild populations. In the present study, accessions of S. oleracea refer to cultivars and landraces, while accessions of S. tetrandra and S. turkestanica have once been collected from wild populations. The accessions of wild spinach were selected using the core selector tool of CGN (https:// cgngenis.wur.nl), which selects accessions based on geographical origin to theoretically maximize the level of genetic diversity captured by the selection.
Accessions lacking clear geographical origin data were disregarded. A posterior visual inspection of the mapped origin locations was performed to deselect geographically adjacent populations. Firstly, the selection of spinach landraces was based on the countries in which S. tetrandra and S. turkestanica are autochthonous (Hassler 2018). Also in these cases, accessions without information on geographical origin were disregarded, except for landraces from Iran as spinach is presumed to have been domesticated in former Persia. The geographical coordinates of the remaining accessions were mapped, and 39 landraces were selected by maximizing the number of represented countries and visually filtering adjacent origin locations within a country. Secondly, 12 landraces from the Eastern Mediterranean and the Southern and Eastern Asian regions where wild spinach is not autochthonous were selected using the core selector of CGN. Thirdly, the Western European cultivars 'Viroflay', 'Resistoflay' and 'Viking' were selected as references. The geographic origin of the study materials is presented in Fig. 1. More detailed information about the accessions is available from the website of CGN (https://cgngenis.wur.nl).
Plant raising for S. turkestanica and S. tetrandra followed the protocol of van Treuren et al. (2020) with slight modifications. Fruits of both species contain multiple seeds per fruit. S. tetrandra fruits were first broken with pliers without damaging the seeds, whereas for S. turkestanica unbroken fruits were used.
The fruits of both species were rinsed in running tap water for 4 days, after which they were sown in trays with soil and vermiculite. Subsequently, the fruits were stratified by placing the trays at 4°C for 3 days. Germination and plant growing was performed in a greenhouse at 15°C and 8 h light. These short-day conditions are applied to avoid rapid bolting of wild spinach. Seeds of cultivated spinach were directly sown in trays with soil and vermiculite. These trays were placed in a greenhouse at 15°C and 16 h light.

Experimental procedures
Cotyledons from a single randomly chosen 20-day-old seedling per accession were harvested and stored in liquid nitrogen cold-tubes (8-strip tubes) prepared for subsequent tissue grinding using metal balls. DNA extraction was performed using the CTAB-based protocol of Fulton et al. (1995). The quality of the purified DNA was assessed by 1% (w/v) agarose gel electrophoresis and the DNA concentration was determined using the Qubit dsDNA BR assay kit (Invitrogen).
The genetic markers used in our study were selected from a total of 419 SNP markers identified and used by Chan-Navarrete et al. (2016) for genetic map construction in spinach. These SNPs showed polymorphisms between the S. oleracea cultivars 'Ranchero' and 'Marabu'. Each SNP and its flanking DNA sequence (50 bp upstream and 50 bp downstream of Fig. 1 Geographical map of the origin locations of the 95 study accessions the SNP) was analyzed by a BLASTn search using the BLAST tool available from the SpinachBase database (Collins et al. 2019; https://www.spinachbase.org), which uses the cultivated spinach genome Sp75 as reference. Markers with multiple hits, no hits, indels or mismatches near the SNP position were discarded for further analysis. Subsequently, polymorphism of the SNPs and presence of additional SNPs and indel elements in the flanking region of the markers was analyzed using the transcriptomic data available at SpinachBase. SNPs were selected that were found to show polymorphism in transcripts from accessions of each of the three Spinacia species, and that were found to lack additional SNPs and indels in the flanking region of the SNPs of interest. A total of 60 SNP markers were selected for our study. The flanking DNA sequences, alleles of the selected SNPs and linkage group information are presented in Online Resource 2.
SNP genotyping was carried out by the Dr. van Haeringen Laboratory (Wageningen, the Netherlands), based on a Competitive Allele Specific PCR (KASP) assay (Semagn et al. 2014). A 96-well plate containing 200 lL of 6 ng/lL purified DNA per sample and an empty well as negative control was delivered for analysis. The primers for the KASP assay were designed by the Dr. van Haeringen Laboratory based on the flanking DNA sequences of the markers. The SNP scores per accession are presented in Online Resource 3.

Data analyses
Four SNP markers with missing data for all accessions were discarded for further analysis. Scores of the remaining 56 SNPs were transformed into a Simplified NEXUS file using Mesquite (Maddison and Maddison 2018) to study phylogenetic relationships. The NEXUS file was used to construct a maximum likelihood phylogenetic tree using IQ-TREE (Trifinopoulos et al. 2016). Parameters were set to 'substitution model = auto' and 'bootstrap = ultrafast', and 1000 bootstrap replications were used. The resulting phylogenetic tree was visualized using iTOL (Ciccarelli et al. 2006).
Population structure of the 95 Spinacia accessions was analyzed with STRUCTURE v.2.3.4. (Pritchard et al. 2000). Parameters for data uploading were set to 'individuals = 95', 'ploidy = 2', 'loci = 56 0 and 'missing value = 999 0 . Structure analyses were based on 25 replications for each presumed number of subgroups (K = 1-10). Parameters for the analyses were set to 'burn-in period = 500,000 0 , 'MCMC repetitions = 750,000 0 and 'ancestry model = admixture', and a correlated allele frequency model (Porras-Hurtado et al. 2013) was used. CLUMPAK (Kopelman et al. 2015) was used to calculate the optimal number of subgroups by means of the methods described by Evanno et al. (2005) and to obtain the population structure bar plots.

SNP genotyping
Out of the 60 selected SNP markers, 56 were amplified in at least part of the study accessions (Online Resource 3). Among these successful markers, nine failed in all S. tetrandra samples, whereas amplification was observed for each of the 56 SNPs in S. turkestanica and S. oleracea. All 56 markers were found polymorphic in S. oleracea, while 55 showed variation in S. turkestanica. Polymorphism in S. tetrandra was observed for only two SNPs. The level of heterozygosity was 0.024 for S. tetrandra, while the values observed for S. turkestanica (0.273) and S. oleracea (0.315) were more than ten times as high (Table 1).

Phylogenetic relationships
Phylogenetic analysis revealed a closer genetic relationship of S. oleracea with S. turkestanica than with S. tetrandra (Fig. 2). The phylogenetic tree showed a clear separation between the three species, except for four S. turkestanica populations ('turk TJK62 0 , 'turk TKM90 0 , 'turk TKM92 0 and 'turk TKM95 0 ) that clustered within the S. oleracea group.
Zooming in on the geographical origin of the S. oleracea accessions (Fig. 2), a small cluster of Eastern and Southern Asian landraces (hereafter denoted as the Eastern group) was separated from a larger cluster of Asian and European accessions (hereafter referred to as the Western group). The Eastern group comprised landraces from Afghanistan, Pakistan, Nepal, India and China, while the Western group shows a much wider geographical composition. Apart from a few small clades, no clear geographical clustering was observed within the Western group. The reference cultivars 'Viroflay' (oler FRA18) and 'Resistoflay' (oler NLD16) clustered together in the phylogenetic tree. Interestingly, the reference cultivar 'Viking' (oler NLD17) was not found close to the other European reference cultivars, but instead closely clustered with

Population structure
Structure analysis showed that the largest delta K was observed when two subgroups (K = 2) are presumed (Online Resource 4). For K = 2 the first subgroup consists of the S. tetrandra and S. turkestanica populations and the ten S. oleracea landraces from the Eastern group, while the second subgroup included a variety of landraces of Asian or European origin and the three reference cultivars (Fig. 3). The S. oleracea landraces from Western and Southern Asia were found to constitute admixtures of both subgroups, the Southern Asian landraces showing a closer genetic relationship to the first subgroup than the Western Asian landraces. The S. turkestanica populations 'turk TKM92 0 and 'turk TKM95 0 were also found to represent admixtures of both subgroups. When three subgroups (K = 3) were presumed, the basic population structure remained largely unchanged, with the exception that the S. tetrandra populations formed a distinct separate subgroup.

Ancestry of cultivated spinach
Spinacia tetrandra and S. turkestanica are the closest wild relatives of cultivated spinach. Previous phylogenetic studies suggested that S. oleracea originated from the domestication of S. turkestanica (Fujito et al. 2015;Xu et al. 2017). This hypothesis was supported by the present study, using a wide variety of populations of S. tetrandra and S. turkestanica that have recently become available. As the KASP markers selected for our study were solely based on two S. oleracea cultivars, the higher amplification success observed in S. turkestanica as compared to S. tetrandra is most likely related to its smaller evolutionary distance with cultivated spinach. Amplification failure Asia, Caucasus = Western Asia (Caucasus region), SE = Southern Europe (Balkans) and MODERN_CV = modern cultivars from Western Europe. The prefixes EAST and WEST correspond to the, respectively, Eastern and Western group of accessions observed in the phylogenetic tree (Fig. 2) due to mismatches in primer binding sites is commonly observed when primers designed for a species are used in another species, and the effect generally reduces with increasing genetic relatedness between the species (Housley et al. 2006;Zou et al. 2020). The closer genetic relationship between S. oleracea and S. turkestanica was also shown by their similar degree of polymorphism and level of heterozygosity, which contrasted with the substantially lower extent of variation detected in S. tetrandra. These differences in estimates of genetic diversity are most likely due to ascertainment bias, an effect that occurs when marker data are not collected from a random sample of polymorphisms in the study population (e.g. Heslot et al. 2013). As the effects of ascertainment bias are known to increase with larger evolutionary distance between samples, our genotyping results indicate that S. oleracea is genetically closer to S. turkestanica than to S. tetrandra. Also our phylogenetic and population structure analyses showed that S. turkestanica is the most likely ancestor of S. oleracea and that S. tetrandra is evolutionary more distant. In the phylogenetic analyses the group of S. turkestanica populations clustered closely with a group of S. oleracea landraces from Southern and Eastern Asia. As this area is geographically closer to the native distribution range of S. turkestanica in Central and Southern Asia than to that of S. tetrandra in the Caucasus, the domestication history of cultivated spinach also makes sense from a geographical point of view. Interestingly, the population structure analysis assigned S. tetrandra, S. turkestanica and the Eastern group of S. oleracea to the same subgroup when two subgroups were assumed. When the number of assumed subgroups was increased to three, S. tetrandra was distinguished as a separate subgroup, while S. turkestanica and the Eastern group of S. oleracea remained together in the analysis.

Taxonomic uncertainties
In our phylogenetic analysis the S. turkestanica populations 'turk TJK62 0 (CGN24957), 'turk TKM90 0 (CGN25141; PI 647,864), 'turk TKM92 0 (CGN25143; PI 662,295) and 'turk TKM95 0 (CGN25274; PI 647,862) clustered with cultivated spinach. A closer genetic relationship of 'PI 647,862 0 with S. oleracea was also found by Xu et al. (2017) and current accession information indicates that this accession shows mixed traits of wild and cultivated spinach (GRIN 2019). Also 'PI 662,295 0 shows traits of cultivated spinach, such as a smooth seed shape (GRIN 2019), although this accession did not cluster with S. oleracea in the study of Xu et al. (2017). For these two samples also our population structure analysis indicated admixture of S. turkestanica and S. oleracea (Fig. 3). The combined results for 'PI 662,295 0 and 'PI 647,862 0 call for modification of the passport data of CGN25143 and CGN25274 in the CGN database. CGN24957 is described as a ruderal population (https://cgngenis.wur.nl) and may have hybridized with cultivated spinach if its collecting site has been disturbed by humans. Our population structure analysis revealed admixture between S. turkestanica and S. oleracea when at least four subgroups were assumed (results not shown). Our results for CGN25141 are not in line with those of Xu et al. (2017), who did not observe taxonomic uncertainty for 'PI 647,864 0 . Both for CGN24957 and CGN25141 additional data are needed to clarify their taxonomic status. If the four taxonomic uncertainties would be removed from the analysis, the number of polymorphic markers in S. turkestanica would decrease from 55 to 49. However, this has no effect on our conclusions regarding the ancestry of cultivated spinach as this reduced number still exceeds the two polymorphic markers observed for S. tetrandra.

Geographic origin of cultivated spinach
It is generally believed that spinach has been domesticated in Iran, former Persia. The geographical area of what is commonly known as Persia has changed throughout history, and is sometimes roughly referred to as current Iran (e.g. Dandamaev 1989). Some authors use the name ''Greater Iran'' to refer to the regions over which Persian dynasties have had influence throughout history (Frye 1962). For this reason, it might be confusing to use Iran and Persia as synonyms, especially when referring to previous periods in history. In our phylogenetic and population structure analyses the seven landraces from Iran appeared less closely related to S. turkestanica than landraces belonging to the Eastern group of S. oleracea accessions. This group includes landraces from Afghanistan and Pakistan, which are part of the native distribution range of S. turkestanica (Hassler 2018) and part of Greater Iran (Dandamaev 1989). Afghanistan and Pakistan also belong to one of the primary origin centers of cultivated plants (Ladizinsky 1998). Therefore, our results may indicate that spinach has domesticated more eastwards from current Iran.

Crop differentiation
Two types of spinach cultivars are commonly distinguished based on geographical origin, namely Asiantype and Western-type spinach (e.g . Simoons 1990;van der Vossen 2004). The two types have their own morphological characteristics and differ in aspects such as leaf shape, petiole and leaf color and bolting tendency (van der Vossen 2004). Previous phylogenetic studies suggested the existence of an association between the genetic background and the geographical origin of spinach accessions, including many commercial cultivars (Göl et al. 2017;Kuwahara et al. 2014;Shi et al. 2017;Xu et al. 2017). In these studies phylogenetic analysis separated Asian and Western spinach cultivars in different clusters. Our study also distinguished two groups of cultivated spinach, namely an Eastern group with accessions from Southern and Eastern Asia, and a Western group encompassing the rest of the study samples, including landraces from Central and Western Asia and Europe as well as the three European reference cultivars.
Some spinach accessions in our study classified as landraces showed a strong genetic resemblance with modern cultivars, thereby constraining the detection of potential associations between geography and phylogeny. Further study of spinach accessions may improve our understanding of the relationship between spinach landraces as well their relationship with modern cultivars.

Crop dispersal
It has been suggested that spinach was introduced in China via Nepal (Laufer 1919). The Eastern group of cultivated spinach included closely related landraces from China, Nepal and India, which could be regarded support for the suggested migration route via Nepal. However, as Nepal does not belong to the distribution area of S. turkestanica, spinach must have been introduced in the country from elsewhere. As the Eastern group also included landraces from Afghanistan and Pakistan, cultivated spinach may have been introduced in Nepal from this region after domestication. Analysis of landraces from more east Asian countries should shed more light on the most likely dispersal routes of spinach to the East. A phylogenetic relationship between landraces from the Middle East and Europe could not be demonstrated in our study. How cultivated spinach has dispersed from the domestication area to Europe remains subject of further study. Genetic analyses to address this question would highly benefit from the inclusion of landraces from Northern Africa and the Western Mediterranean, as well as from more landraces from the Eastern Mediterranean. Unfortunately, landraces from these regions are rare or lacking in current gene bank collections, and therefore constitute main targets for future collecting expeditions.

Concluding remarks
In the present study we used a relatively small set of SNP markers derived from the genome sequences of S. oleracea to address genetic issues in cultivated spinach and its main wild relatives. In cooperation with several breeding companies we have started a research project to develop reference genomes for S. turkestanica and S. tetrandra, followed by the resequencing of genetic resources accessions of both species (https://kia-landbouwwatervoedsel.nl/19089-2/). The availability of large-scale sequencing data is expected to open the door for more in-depth studies on the origin, distribution and population structure of cultivated spinach in relation to its wild relatives S. turkestanica and S. tetrandra, which will facilitate the maximum use of genetic diversity in the breeding of new spinach cultivars.
Author contributions This study was initiated by CK. All authors contributed to the study conception and design. Data collection and analysis were performed by AR, who also prepared the first draft of the manuscript. All authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.
Funding The contribution of RvT and CK was carried out in the framework of the Programme Genetic Resources (WOT-03) and the Fundamental Research Programme 'Circular and Climate Neutral' (KB-34-013-001) both funded by the Dutch Ministry of Agriculture, Nature and Food Quality. The funding source did not play a decisive role in any part of the study, nor in the preparation of the manuscript.

Compliance with ethical standards
Conflicts of interest The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Availability of data and material All data generated or analyzed during this study are included in this published article and its supplementary information files.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.