Analyses of genetic diversity and population structure of anchote (Coccinia abyssinica (Lam.) Cogn.) using newly developed EST-SSR markers

Anchote (Coccinia abyssinica (Lam.) Cogn.) is a perennial root crop belonging to Cucurbitaceae family. It is endemic to Ethiopia and distributed over wide range of agro-ecologies. For further improvement and efficient conservation of this crop, characterization of its genetic diversity and its pattern of distribution is a vitally important step. Expressed sequence tags-simple sequence repeats (EST-SSRs) markers were developed from publicly available watermelon [Citrullus lanatus (Thunb.) Matsum. & Nakai] ESTs in the GenBank database. Among those novel markers, eight were polymorphic and subsequently used for genetic diversity and population structure analyses of 30 anchote accessions collected from western Ethiopia. A total of 24 alleles were obtained across the eight polymorphic loci and 30 accessions that revealed moderate level of genetic diversity in this minor crop. Among the eight loci, locus CA_06 was the most informative with six alleles and polymorphic information content (PIC) of 0.76. The accessions showed about threefold variation in terms of genetic diversity, with expected heterozygosity (He) ranging from 0.15 (accession An) to 0.44 (accession Dg). Other accessions with higher genetic diversity include Ar and Gu (He = 0.43 and 0.41, respectively). Analysis of molecular variance (AMOVA) revealed that the variation within accessions and among accessions accounted for 84.7% and 15.3% of the total variation, respectively. The study revealed low but significant population differentiation in this crop with no clear pattern of population structure. The EST-SSR markers developed in this study are the first of their kind for anchote and can be used for characterization of its wider genetic resources for conservation and breeding purposes.


Introduction
Anchote (Coccinia abyssinica (Lam.) Cogn.) is a perennial root crop that belongs to Cucurbitaceae family Hora et al. 1995). It is endemic to Ethiopia and distributed over a wide range of agro-ecologies adapting to altitudes as low as 550 m above sea level (masl) and as high as 2800 masl (Getahun 1973). The plant grows well in areas where annual rainfall is between 950 and 2000 mm (Getahun 1973;Westphal 1974). This species can reproduce both sexually and asexually through their root tubers. Although anchote is monoecious, the main mode of sexual reproduction is through outcrossing due to protandry . Asexually, it undergoes an annual cycle in which its herbaceous shoots die out and then new shoots emerge at the onset of rainy season from its ''everlasting'' rootstock. Sowing seeds is a preferred means of anchote cultivation as the crop produces hundreds of seeds per fruit (Hora et al. 1995;Wondimu et al. 2014).
Anchote tuber is rich in protein and calcium with low content of anti-nutritional factors, and hence highly recommended as human food (Hora et al. 1995). It has also been used as traditional medicine, in Ethiopia, to treat various illnesses such as Bone fracture, backache, displaced joints and other diseases such as gonorrhea, tuberculosis, and cancer (Dawit and Estifanos 1991;Gelmesa 2010). Even though anchote is economically, nutritionally, and medicinally valuable crop plant, there is limited information regarding its genetics, breeding, best agronomic practices and phylogeography to date. Recent studies on its nutritional contents (Desta 2011), morphological traits (Wondimu et al. 2014) and genetic diversity using ISSR markers (Bekele et al. 2014) revealed key information that can be used for its improvement. However, these studies targeted only limited geographical areas where this crop is currently grown in Ethiopia, and hence the sample size and geographic coverage in previous study did not fully represent the wider gene pool of anchote in the country.
In order to harness the nutritional values and other benefits of this crop, quantifying its genetic diversity is a primary step. At present, single nucleotide polymorphism (SNP) markers are the marker of choice as they have genome-wide coverage and abundance in genic regions of most studied crops. However, such molecular markers require significant investment in terms of resources and time and hence are not an immediate option for orphan crop like anchote. Alternatively, expressed sequence tags-simple sequence repeats (EST-SSRs) are the second best option, as they are quick and cheaper to develop and use (Ellis et al. 2006). EST-SSRs markers have attributes, such as ease to develop, multi-allelic, transferable across genera and are from expressed section of the genome. They are also among markers of choice for QTL analysis to identify genes driving traits of agronomic interest (Ellis and Burke 2007). Therefore, developing such markers for anchote, would lead to enhanced understanding of its genome in general. In particular, such molecular tool can guide in-situ and ex-situ conservation efforts in Ethiopia and also play a role in breeding new varieties. Therefore, the present study was aimed at developing new EST-SSR markers for anchote and uses them for assessing the genetic diversity and population structure of anchote accessions grown in Western Ethiopia where this crop is staple food.

Plant material
Seed samples were collected from anchote accessions in 2013 from areas where the plant exists in nature and where it is widely cultivated by small-scale farmers (Fig. 1). Thirty accessions, each represented by five individual plants, were selected based on their geographic distribution in the sampling area and their seeds were planted at Holeta Agricultural Research Center, Ethiopian Institute of Agricultural Research, located 41.7 km west of Addis Ababa during the year 2013/14 crop growing season. Three weeks after planting, young and healthy leaves were separately sampled from each seedling representing the 30 accessions. Leaf samples were kept in a plastic zipper containing sufficient silica gel for efficient desiccation and stored at room temperature until DNA extraction was conducted.

DNA extraction
Each silica gel dried leaf sample was transferred to a 2 ml eppendorf tube containing two glass beads and was ground to fine powder using Mixer Mill MM 400 (Retsch GmbH, Germany). The DNA extraction from the ground samples was done following protocol described in Geleta et al. (2012). The quality and quantity of extracted DNA was assessed by conducting agarose gel electrophoresis (1.5% (w/v) and NanoDrop measurement. The genomic DNA extracted from four of the samples was of poor quality and consequently discarded. Hence, genomic DNA  (Table 1).

EST-SSRs screening and PCR
Watermelon [Citrullus lanatus (Thunb.) Matsum. & Nakai] expressed sequence tags were downloaded from National Center for Biotechnology Information (NCBI) database (https://www.ncbi.nlm.nih.gov/ nuccore). Then, WebSat (Martins et al. 2009), a web software for microsatellite marker development, was used to identify ESTs containing simple sequence repeats (di, tri, tetra, penta and hexa repeats). This was followed by using Primer3 (Rozen et al. 2000), an online primer designing program, for designing primer-pairs targeting the SSR containing regions of the ESTs. At the end, forty-seven primer-pairs were successfully designed and tested to determine their potential in amplifying the homologous DNA regions in anchote genome. Five, randomly selected, genomic DNA of anchote accessions were used to test these primer-pairs.
The total reaction volume for the amplification of the EST-SSRs was 25 ll composed of 2.5 ll of 10 9 PCR buffer (10 mM Tris-HCl pH 8.3, 50 mM KCl, 1.25 mM MgCl 2 ), 1.5 ll of 25 mM MgCl 2 , 0.3 ll of 25 mM dNTPs, 0.75 ll of 10 lM of each of forward and reverse primers, 1.0 U (0.2 ll) of Dream Taq DNA polymerase (Sigma, Germany), 2.5 ll of 10 ng template DNA, and 16.5 ll Millipore water. PCR amplification was performed in 96-well plates using Thermal Cycler (S 1000TM) machine with the following touchdown PCR conditions: Initial 95°C for 3 min preheating, nine cycles of denaturation at 94°C for 30 s, annealing at 58°C for 45 s and primer extension at 72°C for 45 s. The later three steps were repeated nine times by reducing the annealing temperature by 1°C for each subsequent touchdown cycle. This was followed by 30 cycles of 94°C denaturing for 30 s, 48°C primer annealing for 45 s and 72°C primer extension for 45 s. At the end of these cycles, additional 3 min primer extension was done at 72°C. To check quality and size of amplifications, sub-sample, from each PCR reactions were loaded to ethidium bromide containing 1.5% agarose gel and electrophoresed using 1xTAE buffer. The gels were photographed using gel documentation system and the fragment sizes were compared with 50 bp DNA ladder (GeneRuler TM , Fermentas Life Sciences), which was used as molecular size standard. Out of the 47 EST-SSRs primer-pairs tested, 12 primer-pairs amplified a maximum of two DNA fragments of expected size per sample and hence were selected for use in this study. The description of these 12 primer-pairs is provided in Table 2. The forward primers were 5 0 -end labeled with carboxyfluorescein (6-FAM TM ) or hexachlorofluorescein (HEX TM ) fluorescent dyes, whereas the reverse primers were tailed with GCTTCT to reduce polyadenylation and improve genotyping as described in Ballard et al. (2002). These 12 labeled primer-pairs were used for PCR amplifications of the DNA samples extracted from the 146 individuals representing the 30 accessions ( Table 2).
The amplified products were multiplexed into panels based on the fragment size differences and the type of fluorescent dye of the forward primers. This was followed by capillary gel electrophoresis using ABI PrismÒ 3730xl genetic analyzer (Applied Biosystems). The peak identification and fragment size determination, based on the Genescan-500 LIZ internal size standard, were done using GeneMarker ver. 2.4.0 (SoftGenetics).

Data analyses
Various genetic diversity parameters were analyzed using different software. Observed number of allele (N a ), effective number of alleles (N e ), percentage of polymorphic loci (%PL), Shannon-Weaver diversity index (I), observed heterozygosity (H o ), expected heterozygosity (H e ) and gene flow (N m ) were analyzed using POPGENE ver. 32 software (Yeh and Yang 999). DARwin ver. 6.0.112 software (2014) was used for cluster analysis of the 30 accessions using neighbor joining method. POWERMARKER ver. 3.25 software (Liu and Muse 2005) was used to calculate polymorphic information contents (PIC) for each locus (Anderson et al. 1993). Analysis of molecular variance (AMOVA) was calculated using ARLEQUIN ver. 3.01 software (Excoffier et al. 2005). HP-Rare ver. 1.0 software was used to calculate allelic richness and private allele Kalinowski (2005), allele frequency per locus as well as gene diversity per locus and per population. Excel add-ins (GENAIEX) ver.6.502 software Peakall and Smouse (2004) was used for pair-wise population differentiation, and pair-wise genetic distance.
The population structure analysis by clustering method and determination of optimum number of clusters (K) were conducted using STRUCTURE ver. 2.3.4 software (Pritchard et al. 2000) using admixture model (Gilbert et al. 2012). The STRUCTURE analysis was conducted for K = 2 to K = 25 with burn-in period of 200,000 and number of Markov Chain Mont Carol (MCMC) replications of 200,000, and 10 runs at each K. Optimum K value (maximum DK), estimation of different genetic groups, was determined by the STRUCTURE HARVESTER software according to Evanno et al. (2005). Cluster alignment across the replicates was done using CLUMPP software (Jakobsson and Rosenberg 2007) and the population clusters were visualized by DISTRUCT software (Rosenberg 2004).

Results
In present study twelve EST-SSR markers were developed for anchote of which nine were di-nucleotide repeats and the rest three were tri-nucleotide repeats ( Table 2). Four of these loci (CA_03, CA_04, CA_05 and CA_12) were monomorphic (Table 2).  The other eight loci showed different levels of polymorphism and their data was used for genetic diversity and population structure analyses of the 30 accessions. A total of 24 alleles were identified across the 146 samples and the eight loci (Table 3). The observed number of alleles per locus (N a ) ranged from two to six with an average of three alleles per locus. The effective number of allele (N e ) ranged from 1.07 (locus CA_08) to 4.70 (locus AC_06) with an average of 1.90 (Table 4). The frequency of the alleles across all accessions varied from 0.01 (Alleles A and D of locus CA_01) to 0.97 (allele B of locus CA_08) ( Table 3). Locus CA_06 is the most polymorphic loci with six alleles and polymorphic information content (PIC) of 0.76, whereas the lowest PIC (0.06) was recorded in locus CA_08. The highest (1.64) and lowest (0.14) Shannon's information index (I) were recorded for loci CA_06 and CA_08, respectively. The eight loci have a mean PIC and I values of 0.31 and 0.63, respectively (Table 4). Among the loci with only two alleles, locus CA_11 is the most informative with PIC value of 0.37. Among the eight loci, locus CA_06 and locus CA_08 are the highest and lowest values (Table 4).
Gene diversity (He) estimated across the accessions for each locus varied from 0.06 (locus CA_08) to 0.79 (locus CA_06) with a mean of 0.37, whereas observed heterozygosity (Ho) varied from 0.01 (locus CA_08) to 0.84 (locus CA_11) with a mean of 0.35 (Table 5). Locus CA_01, CA_07, and CA_08 indicated significant inbreeding of individuals in each accession (F IS ) whereas all loci, except CA_10 and CA_11, showed significant inbreeding of individual genotypes when compared to random association of alleles in the whole population (F IT ). CA_02, CA_06 and CA_08 are major contributing loci for significant population differentiations (F ST ). The mean total (Ht), within accessions (Hs) and among accessions (D ST ) gene diversity were 0.37, 0.35 and 0.02, respectively. The overall average population differentiation (F ST ) and gene flow (Nm) were 0.11 and 2.06 (Table 5).
A minimum frequency of major allele (FMA) observed was 0.2 for Ld accession by locus CA-10.
The alleles are named from ''A'' to ''F'' in respective order of their allele size (small to large) within the observed allele size range (OASR) provided in Table 1 for each locus  Table 2). 12.9% of the accession-pairs had genetic distance of above 0.2 between them whereas 34.9% pairs had genetic distance of below 0.1 between them (data not shown). Accession Ds is the most differentiated with a mean genetic distance of 0.30 from the other accessions whereas accessions Ar and Ac are the least differentiated with a mean genetic distance of 0.09. The analysis of pair-wise F ST revealed that 12.9% accession-pairs were significantly differentiated (Supplementary Table 2).
Analysis of molecular variance (AMOVA) of the genotypic data of the 30 accessions partitioned the total variance into among individuals within accessions and among accessions components which accounted for 84.65% and 15.35% of the total variance, respectively. The analysis revealed a significant population differentiation with F ST value of 0.15 (P \ 0.001) (Table 7). Furthermore, the genetic relationship between the 30 accessions was determined through Neighbor-joining cluster analysis based on Nei's standard genetic distance between the accessions. The cluster analysis revealed three major clusters, with cluster-I, II and III comprising 25, 2 and 3 accessions, respectively (Fig. 2), but fails to show any clear association with geographic origin. The optimal number of genetic clusters (K) revealed through admixture model-based population structure analysis was four (Fig. 3a). Hence, the 146 individuals of the 30 accessions most likely originated from four genetic populations. The analysis clearly suggested low differentiation between the accessions, as all accessions have alleles originated from the four genetic populations (clusters) as shown by graphical representation of the population genetic structure of the 30 accessions at K = 4 (Fig. 3b).

Discussion
Anchote has a good potential to share the burden of cereals and other crops as additional calorie source in Ethiopia. The fact that anchote is a drought tolerant tuber crop (Getahun 1973), makes it ideal candidate for the current erratic climatic conditions characterized by frequent droughts. However, despite its attributes in terms of economy, nutrition and resilience, it has yet to receive research attention and only semi-domesticated (Dawit and Estifanos 1991;Hora 1995;Gelmesa 2010). In the present study, a total of 12 EST-SSR markers were developed from publicly available EST sequences of watermelon. The fact that these markers were able to amplify expected size fragments in anchote, opens the door of adopting other genomic tools such as SNPs/indels developed in related species, like watermelon, for anchote gene pool characterization and improvement. In addition, these newly developed EST-SSR markers can be a vital tool for taxonomic status determination of the species and subspecies of the genus Coccinia.
Among the newly developed EST-SSR markers, four were monomorphic across 146 individuals included in study, hence excluded from further diversity analyses. These monomorphic loci may turn out to be polymorphic if additional anchote accessions from wider geographic areas were to be included, because analysis of a wider gene pool of crop species facilitates the identification of rare alleles (Kalinowski 2004). Rare alleles within genes of significant functions are usually related to out-breeding species, like anchote, where they occur in heterozygous form; otherwise can be deleterious in their homozygote state (Frankham 2002). Hence, further study is required to elucidate the significance of these loci in population genetics analyses of anchote.
The two commonly occurring types of microsatellites, in flowering plant species, are di-nucleotide and tri-nucleotide (Simko 2009;Cavagnaro et al. 2010;Wen et al. 2010;Dillon, et al. 2014). In the present study, di, tri, tetra, penta and hexanucleotide repeat EST-SSRs were included in the initially targeted 47 loci although most of them were di and trinucleotide repeat SSRs. However, only the dinucleotide and trinucleotide repeat SSRs successful in amplifying homologous regions in anchote (Table 1), suggesting the higher transferability of these two groups of SSRs than the other groups. However, changes in repeat number of di-nucleotide repeat SSRs causes frame shift mutation that may alter the associated protein and/or function (Metzgar et al. 2000). Interestingly, the highest allelic richness in the present study was  (Kong et al. 2006), Cucurbitaceae species (Reddy 2009) and Cucurbita pepo (Mao et al. 2014).
The moderate genetic diversity observed in this study indicates that strong selection pressure; probably farmers' intervention to improve yield and related traits may have caused reduced polymorphism. A similarly narrow genetic base in cultivated cultivars of watermelon was observed when genotyped with similar or other types of molecular markers (Wang 2011). Therefore, in the future breeding efforts, accessions with highest genetic diversity among the 30 accessions such as Dg, Ar and Gm should be considered for diversifying the genetic base of cultivated varieties and also to incorporate resilient genes capable of withstanding different abiotic stresses. On the contrary, An, Dh, Gc were the accessions with the lowest genetic diversity observed, which might be inferred to recent adaptation of anchote to the areas where these accessions were collected (founder effect) or recent bottleneck (Hawks et al. 2000).
The highest values of fixation indices for inbreeding (F IS and F IT ) were recorded for loci CA_08 and  (Table 4). Theses loci have higher heterozygote deficiency, because for most populations the alleles of these loci were less than two. Such loci should be given further research attention as they might be linked to or located at vital genes that minimize recombination frequency in order to prevent loss of vital function. Contrary to these loci, locus CA_11 was represented by excess heterozygosity which suggest heterozygote advantage at that locus. Multi-allelic loci are of greater value for poorly studied crops like anchote as they can serve as molecular barcodes for variety/accession identification or can play a role in efficient conservation and management of anchote germplasms in Ethiopia and beyond. Hence, loci CA_06 and CA_01 should get priority for these purposes, as they have higher PIC than other loci. In this study, low but significant differentiation was obtained between the accessions (F ST = 0.15; P \ 0.001) with 54.2% of the accessionpairs showing significant differentiation. Similar study using inter simple sequence (ISSR) markers revealed higher population differentiation (F ST = 0.49) (Bekele et al. 2014). Lower population differentiation and reduced genetic diversity are expected when EST-SSRs are used instead of neutral markers such as ISSR, as mutation rate in EST-SSRs is relatively low (Serra et al. 2007). Hence, significantly differentiated accessions need to be given priority both for conservation and breeding purposes. The low population differentiation is attributable to the moderate level of gene flow between anchote populations similar with other studies (Wickert et al. 2012). In future germplasm collecting missions, districts from which more diverse populations were obtained should be given priority in order to capture further diversity. Analysis of molecular variance revealed that about 85% of the total variation accounted for the variation within accessions. This is probably due to direct germplasm exchange between farmers of different districts, and The genetic distance between populations did not follow the geographic distance between the sampling sites of the accessions. The neighbor-joining cluster analysis also clearly showed complex patter of clustering that is not in line with the geographic origin of the accessions. The population structure analysis revealed high level of admixture between the accessions. The analysis also revealed not much differences between the genotypes in terms of the likelihood of their membership in each of the genetic populations, indicating lack of clear population structure in anchote. The anchote accessions included in the present study were collected from areas where the local communities share similar socio-economic and food culture, which might have resulted lower than expected differentiation/diversity between the accessions and lack of clear population structure. It would be interesting to analyze anchote germplasm representing local communities with different socioeconomic and food culture in the future studies.

Conclusions
The development of novel EST-SSR markers, and their application for the analyses of genetic diversity, is the first of its kind in anchote. Although the number of markers developed in this study is few, they have an important role to play in characterization of anchote genetic resources for conservation and breeding purposes. The result of this study also serves as additional evidence in supporting the across-species transferability, within a family, of a good proportion of EST-SSR markers. The present study revealed moderate level of genetic diversity in anchote accessions and suggested potential hotspots for conservation. However, further research including more germplasms and molecular markers is required in order to have a clearer pattern about its population genetics as well as for its improvement through breeding. Hence, this study serves as a motivation for researchers to develop genomic tools for this vital but orphan crop so that modern breeding tools such marker assisted selection (MAS) and genomic selection (GS) can be applied, for accelerated delivery of high yielding and ''climate smart'' anochte varieties with wider adoption which in turn contributes towards food and nutrition security in Ethiopia and Eastern Africa.