Introduction

Enset (Ensete ventricosum, (Welw.) Cheesman), commonly known as false banana or Abyssinian banana, is a perennial diploid (2n = 18), monocarpic species belonging to the family Musaceae in the genus Ensete (Westphal et al. 1975). Enset, banana and plantains are the most important cultivated members of the family; all with high global and local economic as well as food security importance (Baker and Simmonds 1953; Simmonds 1962). The genus Ensete consists of three species with extensive geographical distribution, E. ventricosum and E. livingstonianum in Africa and E. glaucum in Asia, and five other localized endemics or near-endemic species (Borrell et al. 2019). Enset (Ensete ventricosum, (Welw.) Cheesman) is the only cultivated species of the genus Ensete and its domestication and utilization as a food and fiber crop is so far restricted to Ethiopia. It was domesticated in Ethiopia as early as 10,000 years ago (Brandt et al. 1997). Enset is highly drought tolerant with a wide agroecological distribution and is cultivated only with household-produced inputs (Brandt et al. 1997; Tsegaye and Struik 2002). It is unknown whether its wide distribution across a range of altitudes involves genetic or phenotypic adaptation (Tsegaye 2002). Eighty percent of the enset production is concentrated in the southern and southwestern part of Ethiopia (Bezuneh et al. 1967), where it serves as a staple and co-staple food for about 25 mill people (Borrell et al. 2020; Brandt et al. 1997; Spring et al. 1996). Furthermore, it is used for several other purposes, such as animal feed, fiber, construction material and in traditional medicine. The crop grows best at cooler, higher altitudes and is found mostly between 1200–3100 m above sea level (Brandt et al. 1997). Enset plants grow 4–8 m, sometimes up to 11 m height. Cultivated enset are propagated vegetatively, while wild enset reproduces through seeds (Birmeta et al. 2004; Borrell et al. 2019; Tsegaye and Struik 2001). Enset is usually harvested 4–6 years after transplantation, but age at harvest varies between 3 to 12 years (Borrell et al. 2020; Brandt et al. 1997). Thus, if other crops fail, enset plants can be harvested at any time, providing security against hunger for farmers and their families. This became evident through the great famine in Ethiopia in the years 1888 to 1892 (Tobiaw and Bekele 2011), and is the reason why enset is called “The Tree Against Hunger” (Brandt et al. 1997; Costa 1984). This is an important aspect of introducing enset to other, more food insecure regions in Ethiopia, particularly in the dry north.

Ethiopia is the center of origin of many plant species, including enset (Engels and Hawkes 1991). The presence of wild and cultivated enset indicates that Ethiopia is the primary center of origin and center of diversity (Purseglove 1985; Vavilov 1951). Ethnic groups in Ethiopia recognize and exploit various enset landraces. Regions in Ethiopia with diverse cultural history have rich biodiversity (Tsegaye 2002). Enset-based farming system is a major agricultural system and farmers cultivate many enset landraces across various climatic and agroecological systems (Borrell et al. 2019). Research on genetic diversity of specific enset accessions from local regions using molecular markers such as amplified fragment length polymorphism (AFLP) (Negash et al. 2002; Tesfamicael et al. 2020), random amplified polymorphic DNA (RAPD) (Birmeta et al. 2004), Inter simple sequence repeats (ISSR) (Tobiaw and Bekele 2011), chloroplast DNA sequences (Bekele and Shigeta 2011), simple sequence repeats (SSR) (Gerura et al. 2019; Getachew et al. 2014; Olango et al. 2015; Biswas et al. 2020; Nuraga et al. 2022) and single nucleotide polymorphisms (SNPs) (Tesfamicael et al. 2020) revealed genetic diversity among and within wild and cultivated enset accessions. SNP markers are powerful tools for estimating genetic similarities and diversity. SNP markers are abundant and robust, suitable for automated high-throughput genotyping of many samples and are able to resolve differences among extremely similar individuals and increase the accuracy of diversity estimates (Hinze et al. 2017). The double-digest restriction-site associated DNA (ddRAD) technique is a powerful and relatively cost-effective approach for developing numerous SNP markers and constructing high-density genetic maps (Peterson et al. 2012). It has been used extensively for population genetic research in a wide range of non-model organisms (Andrews et al. 2016; Peterson et al. 2012).

The cultivated enset is vegetatively propagated, genetic divergence among clones may be minimal and could be difficult to detect using these marker types (McKey et al. 2010). Moreover, different molecular markers have different properties and will reveal different aspects of genetic diversity (Karp et al. 1997). The investigations mentioned above were conducted in certain enset growing areas in the southern and southwestern part of the country. Since Ethiopia is the center of diversity, many enset rich locations harboring large amounts of diversity of cultivated and wild enset are yet to be studied and is not represented in ex situ collections. Enset clones have traditionally been characterized phenotypically, however, phenotypic description is limited by the cost, time and space required to make visual observations and measurements (Hinze et al. 2017).

Despite the abundance, diversity and ecological importance of enset, the species is not well characterized at the genomic level and has been far less studied than other cultivated species in the family Musaceae (Borrell et al. 2019). More detailed diversity research of both cultivated and wild enset accessions in Ethiopia is necessary to meet future needs, including diversification of crops in more vulnerable regions in Ethiopia. Novel sources of genetic diversity need to be identified, characterized, incorporated into breeding programs, and utilized for the development of non-redundant core collections for conservation and breeding. In this study, SNP markers were developed and used to understand the population divergence of cultivated and wild enset. Understanding the genetic basis of enset domestication provides a valuable foundation for enset conservation and genetic improvement. The objectives of the present study were: (1) to evaluate the efficacy and suitability of SNP markers developed from ddRAD sequencing for high-throughput genotyping of enset; (2) to assess population structure, genetic diversity, and relationships among and within cultivated and wild enset accessions, and (3) to identify candidate genes potentially subjected to domestication and selection.

Materials and methods

Sampling area

The Southern Nations, Nationalities and Peoples’ Region (SNNPR) state has a total area of 117,506 km2, with altitudes ranging from 378 to 4,201 m above sea level (m a.s.l) (Abebe 2005). Enset accessions were collected from three main enset culture communities, which are densely populated enset cultivating administrative regions (Sidama, Gurage and South Omo). The wild enset were collected around farms, along riversides and in deep forests. The three collection regions were deliberately chosen based on their enset production potential in SNNPR, where more than two-thirds of the country’s enset production is located (Zeberga et al. 2014). We collected 226 cultivated and 10 wild enset accessions originating from different geographical locations and agroecological zones (Table 1; Supplementary Table 1). The major ethnic regions cultivating enset and the study areas in Southern Ethiopia are show in Fig. 1.

Table 1 Enset (Ensete. ventricosum (Welw.) Cheesman) plant materials from Ethiopia used for genetic diversity analysis
Fig. 1
figure 1

An overview of the study districts in Ethiopia (A) and their detailed locations in Gurage (B), South Omo (C) and Sidama (D)

Preparation of NaCl-CTAB preservation and samples collection

The saturated NaCl-CTAB solution was used to preserve the enset leaf samples upon collection, as described by Rogstad et al. (Rogstad 1992) with minor modifications. Briefly, 550 g NaCl was added to 1 L of water, boiled, and cooled at ambient temperature, and mixed thoroughly until the salt precipitated. Then, 35 g of CTAB was added gradually with intermittent irregular intervals mixing, until the solution became viscous. 35–40 mL of the prepared solution was aliquoted into 50 mL Falcon tubes and used for preservation of tissue samples. A pair of scissors was used to remove leaf samples from the mother plants, and the scissors were cleaned with ethanol (96%) between independent samples. Fresh cigar-leaf samples harvested from each enset accession were stored immediately in the 50 mL tubes containing the saturated NaCl-CTAB preservation solution. Samples were then placed in a black plastic bag and stored in a dark room at ambient temperature to preserve genomic DNA from degradation during transportation from the farmer fields in Ethiopia to the laboratory in Norway.

DNA extraction

DNA was extracted from the preserved leaf samples using the DNeasy Plant Mini Kit (QIAGEN, Hilden, Germany). DNA quality and quantity were determined using a NanoDrop spectrophotometer (Thermo Fisher, Inc.) and agarose gel electrophoresis (1%). DNA concentrations were determined using the Qubit® dsDNA BR assay kit (Life Sciences) and Quant-iT™ PicoGreen™ (Life Sciences) dsDNA assay.

Double-digest restriction-site-associated DNA (ddRAD) library preparation and Illumina sequencing

We calculated the number of reads required for 20X coverage of restriction fragments in the 150–500 bp size range across 10 multiplexed individuals using multiple enzyme pairs, assuming 0.44 GC content, to ensure that restriction fragments could feasibly be sequenced with enough coverage on an Illumina MiSeq platform. The ddRAD procedure used in this study was modified from Peterson et al. (2012) (For further ddRAD information and the complete protocol, see Supplementary information, Table 2–7; Supplementary Table 1). 500 ng of each DNA sample was double digested using EcoR1 HF (the “rare cutter”– recognizes a six bases motif, i.e., 5′-GAATTC), and MseI (the “frequent cutter”– recognizes a four bases motif, i.e., 5′-TTAA) restriction endonucleases, and adapters ligated to the digested fragments. Each DNA sample with a unique P1 barcode, and a P2 barcode common for all samples. Samples containing unique P1 barcodes were pooled, and the Sage Science Blue Pippin system (https://sagescience.com/) was used to select fragments of about 600 bp to reduce the possibility of unknown introns in the selected sequences and maximize the chances of obtaining SNPs. Size-selected libraries was bound to Dynabeads® M-270 Streptavidin magnetic beads (Invitrogen), to eliminate fragments without the P2 adapter, and the libraries amplified by PCR using Phusion™ Polymerase kit (Invitrogen) and index-marked primers for further tagging of the samples. The libraries were analyzed using an Agilent 2100 Bioanalyzer and diluted to a concentration of 35 nM for paired-end sequencing using the V2 sequencing kit on the MiSeq platform (Illumina). The sequencing was performed at the Norwegian University of Life Sciences, Norway.

Sequence data analysis and SNP calling

The GBS data obtained was quality checked using FASTQ format, FastQC (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/). High quality reads were retained after trimming the bad quality reads using Trimmomatic program (Bolger et al. 2014). The raw paired-end sequence reads obtained from MiSeq were quality checked after removing the adapters and barcodes. The clean paired-end reads were used to call the SNPs using the STACKS 2 (Rochette et al. 2019) pipeline. The SNPs were filtered based on the following criteria: (1) variant should be bi-allelic SNPs, (2) SNPs having more than 20% missing information were excluded, (3) genotypes having more than 20% missing information were excluded, and (4) markers with minor allele frequency MAF > 0.05 were retained.

Population structure analysis

Population groups were inferred using the fastSTRUCTURE software (Raj et al. 2014). Twenty independent test runs were conducted allowing K to vary from 1 to 20. The optimal value of K for these runs was then determined using the ChooseK function. The script ChooseK, included with the fastSTRUCTURE package, was used to choose the number of subpopulations that maximize the marginal likelihood. The cluster membership matrices of the fastSTRUCTURE outputs were visualized using structure selector tool (Li and Liu 2018). Following the assignment of individuals to populations, the program package CLUMPACK (Kopelman et al. 2015) was used to summarize the structure results into structure plots.

Genetic diversity analysis

For genetic diversity analysis, subpopulations were defined as the number of clusters produced by fastSTRUCTURE at K = 12. Genetic diversity among and within populations, observed (HO) and expected (HE) heterozygosity, and pairwise fixation index (FST) for the subpopulations (Weir and Cockerham 1984) was estimated by analysis of molecular variance (AMOVA) using Arlequin v.3.5 (Excoffier and Lischer 2010). Significance (P < 0.05) of the FST values were estimated using 1023 permutations. FST results were interpreted using the same standard as in Pino Del Carpio et al. 2011; Hartl et al. 1997; and Wright 1978.

Phylogenetic trees and PCA analyses

To examine the relationship between cultivated and wild enset accessions, Principal Component Analysis (PCA) was performed using TASSEL v5.2 (Bradbury et al. 2007) and maximum-likelihood (ML) phylogenetic tree analyses performed using PhyML 3.0 (Guindon et al. 2010). The trees were prepared and visualized using the iTOL v4 online tool (Letunic and Bork 2019). PCAs were graphically summarized using scatter plots. Populations were named according to the passport data denoting geographical origin.

FST outlier tests for detecting SNP loci under selection

To detect loci under directional selection, we used the hierarchical method (Excoffier et al. 2009), a modified approach of Beaumont and Nichols (1996), implemented in the ARLEQUIN software package version 3.5.1.3 (Excoffier and Lischer 2010). We employed a hierarchical island model based on 2 groups (cultivated and wild enset) with 50,000 simulations to generate the joint distribution of FST versus heterozygosity. Loci that fall out of the 99% confidence intervals of the distribution were identified as outliers being putatively under selection. The putative function of genes with outlier SNPs was identified using the Gene Ontology (GO) annotation using the Blast2GO software tool version 3.0 (Conesa et al. 2005).

Results

SNP discovery and filtering

Following sequencing of the double digest RAD fragments, data processing and SNP filtering, the alleles with high heterozygosity (> 80%) were removed and a total number of 3505 high-quality SNPs were identified among the 236 enset accessions.

Genetic structure

The genetic structure analysis using fastSTRUCTURE suggests that the most likely number of subpopulations is 12, i.e., the model complexity that maximizes likelihood is 12 (likelihood = −0.82) and the highest peak shows K = 12 as optimal (Fig. 2A). The results of the fastSTRUCTURE analysis are shown in Fig. 2B. The 10 wild enset accessions from South Omo make up a distinct group which is stable at all levels above K = 7. The 62 cultivated ensets from Gurage separated from Sidama and South Omo accessions and seems to make up a rather unique subpopulation, while the cultivated enset accessions from Sidama (72) and South Omo (92) represent many subpopulations.

Fig. 2
figure 2

Population structure (A) Model complexity that maximizes likelihood is 12 and the highest peak shows on the graph the best K = 12 (B) Population structure based on fastSTRUCTURE output resulting in K = 12 being the most likely number of genetic clusters, where each cluster is represented as a different shade and each bar represents an individual within each geographic region/cultivation status; colors represent the groups identified. 1: Gurage (cultivated enset); 2: Sidama (cultivated enset); 3: South Omo (cultivated enset); 4: wild enset (from South Omo)

PCA and phylogenetic relationships

Results from the principal component analysis (PCA) are presented in Fig. 3. The PCA showed that some of the populations were more clearly separated while others were clustered more closely. The first three components described 20, 18 and 9% of the total variance, respectively. PC1, with some overlap, separates Gurage accessions from accessions of the other regions, while PC2 separates the South Omo accessions in two clusters, one of them overlapping with the Sidama cluster. However, the Sidama and South Omo accessions in this cluster are partly separated by PC3, with the wild accessions clustering, as expected, with the South Omo subcluster. The phylogenetic analysis grouped the enset accessions into different clusters, to a large degree reflecting geographical origins and cultivation status (Fig. 4). The wild accessions formed a clearly distinguished clade from the cultivated enset accessions (Fig. 4B). Generally, some accessions of cultivated enset tended to have longer branches (Fig. 4B). Interestingly, twelve accessions (19.35%) collected from the Sidama region clustered into Gurage. However, surprisingly no Gurage accessions clustered with Sidama accessions in this study. Besides, four accessions collected from Sidama and two from Gurage clustered with South Omo accessions. Some accessions have the same names in different regions, e.g., Gena, Astara and Mazia, however, they are certainly different accessions since they cluster in different clades in the phylogenetic tree (Fig. 4). The phylogenetic analysis confirms the results of the structure analysis; the most genetically unique accessions, apart from the wild accessions, are the Gurage accessions, while accessions from Sidama seems to have a mixed ancestry, whereas the South Omo accessions clearly represents two genetically diverse subgroups.

Fig. 3
figure 3

Principle component analysis (PCA) plots of PC1 and PC2 (A), and PC1 and PC3 (B). The percentages in brackets indicate the variance explained by the different PCs

Fig. 4
figure 4

Phylogenetic tree (A) Maximum-likelihood phylogenetic tree with branch length displayed; (B) Topological view of the maximum-likelihood phylogenetic tree. Accessions are numbered as in Supplementary Table 1 and colored according to their geographical origin and cultivation status, i.e., South Omo = blue; Sidama = orange; Gurage = purple; wild enset = green

Genetic diversity and pairwise population differentiation

The results of the AMOVA analysis are presented in Table 2. Most of the genetic diversity (91.2%) is within the enset accessions, and very little (8.8%) between accessions. Analysis of the genetic differentiation between the geographic regions showed that 92.4% of the genetic diversity was within and only 7.6% between geographic regions (Table 2). Observed (HO) heterozygosity was slightly higher in Sidama and the wild group (HO = 0.33 and 0.32, respectively) than in Gurage and South Omo (both Ho = 0.31), while expected (HE) heterozygosity was slightly lower than observed in Sidama and Gurage and higher than observed in South Omo and the wild group (Table 3). Generally, the molecular diversity was highest within the wild accessions and lowest within cultivated enset from Gurage. Pairwise population differentiation (FST) showed, as expected, that the largest subpopulation division is between the wild accession group and the cultivated (moderate to large differentiation, FST = 0.14–0.17), with the largest differentiation between the wild and the Gurage group. Between the cultivated enset groups, the largest differentiation is between Gurage and South Omo (FST = 0.10), while differentiation between Sidama and the other two groups are smaller (FST = 0.06–0.07) (Table 4).

Table 2 Analysis of molecular variance (AMOVA) among and within populations and regions of cultivated and wild enset
Table 3 Genetic diversity among cultivated and wild enset populations based on 3505 SNP markers
Table 4 Average pairwise population differentiation (FST)

Signatures of selection and functional analyses

Signatures of balancing and directional selection were identified at 35 loci among cultivated and wild accessions using the hierarchical method (Excoffier et al. 2009) (Fig. 5). Putative balancing selection was detected at 23 loci and directional selection was detected at 12 loci (Fig. 5). Among the 12 loci, six loci have putative gene functions, while the other six loci have unknown gene functions. Putative functions of these six loci are described in Table 5.

Fig. 5
figure 5

Candidate loci under selection were identified using FST based outlier approach (hierarchical structure model using Arlequin 3.5). FST: locus –specific genetic divergence among the populations; heterozygosity: measure of heterozygosity per locus. Loci significant at the 1% level are indicted by red dots

Table 5 Functional annotation of outlier single nucleotide polymorphisms (SNPs) potentially involved in domestication of enset

Discussion

Population structure and differentiation between wild and cultivated Enset

In this study, a high-throughput sequencing technology was used to explore genetic diversity, population structure, and selection signatures in cultivated and wild enset accessions collected across the center of origin and domestication in Ethiopia. The ancestral admixture and phylogenetic analyses showed a clear separation between wild and cultivated enset (Fig. 2A, 4). Most probably this separation between wild and cultivated enset populations can be attributed to the difference in propagation methods (Birmeta et al. 2004; Gerura et al. 2019; Olango et al. 2015; Tesfamicael et al. 2020). It is interesting to note that cultivated enset accessions collected from regions where wild enset grows showed higher admixture and weaker clustering than those collected from regions where wild enset does not grow. This could be due to higher enset diversity in that specific region with wild enset and indicates exchange of genetic material by crossing between cultivated and wild enset. Besides, the phylogenetic tree analysis showed that populations from adjacent regions like Sidama and Gurage formed a polyphyletic group, which was not the case with distantly located populations e.g., populations from Sidama and South Omo (Fig. 4B). This genetic structure could be explained by a combination of genetic drift locally and the founder population. However, the analyses showed admixture of very few accessions irrespective of their origins whether the accessions were located isolated far apart or close like Sidama and Gurage (Fig. 4). Remarkably, some accessions collected from Sidama clustered with Gurage accessions, suggesting that these accessions are most possibly of Gurage origin. However, no Gurage accessions clustered with accessions from Sidama. In addition, four accessions from Sidama and two from Gurage clustered with South Omo accessions. Taken together, this indicate human sharing and exchange of some clonal materials among and within regions (Gerura et al. 2019; Getachew et al. 2014). As pointed out earlier, some accessions have the same vernacular names in different regions, e.g., Gena, Astara and Mazia. However, they are certainly different accessions genetically based on their SNP profiles, and they have not been exchanged by humans even if they have the same vernacular name.

The phylogenetic tree showed long branches for the wild population from South Omo and for a few cultivated enset accessions too (Figs. 4A, 4B), suggesting high rates of nucleotide substitution and consequently high diversity. Furthermore, the phylogenetic tree revealed a relatively close association between South Omo and Sidama enset populations (Figs. 4A, 4B) and lowest FST value was found between these two populations (Table 4). Thus showing that Sidama and South Omo populations have close relationship with each other which might be due to possible vicariant evolutionary event from a single common ancestor through the fragmentation of their common ancestor's range or historical relationship (Schaal et al. 1998).

Values of the fixation index (FST) above 0.15 indicate significant differentiation between populations (Frankham et al. 2002). In this study, we observed that significant divergence between enset populations. The wild population showed moderate to large genetic differentiation from the cultivated populations from the regions, while there was relatively small differentiation between the cultivated populations. Cultivated enset is only propagated vegetatively and farmers harvest enset before seed set, while wild enset are propagated exclusively by sexual reproduction (Birmeta et al. 2004; Brandt et al. 1997). As a result of this, gene flow between cultivated and wild enset is probably very limited. Besides, the natural distribution of wild enset, as well as the farming and management practices of cultivated accessions have an impact (Birmeta et al. 2004; Olango et al. 2015). Further, limited exchange of genetic material by humans or natural factors may be considered as the main reasons for the larger genetic differentiation observed between wild and cultivation populations (Birmeta et al. 2004; Gerura et al. 2019; Tesfamicael et al. 2020).

Importantly, moderate genetic differentiation was found between wild and cultivated enset from South Omo. This might be due to the co-existence of wild and cultivated enset in the South Omo region, where farmers introduce wild accessions into the cultivation areas and hence genetic exchange occurs between cultivated and wild populations of enset in this particular region (Shigeta 1992). In contrast, the highest FST value (0.17) was observed between wild enset and accessions from Gurage (Table 4). This shows that these accessions are more isolated from one another; most likely there is no wild enset growing in the Gurage region. Similar result can be seen from the population structure and phylogenetic analyses. The Gurage accessions are separated and formed a single cluster on their own far from the wild enset cluster (Fig. 3, 4). Another reason is that Gurage maybe has a different cultural and ethnic origin. This indicates that there is unique genetic diversity within the Gurage accessions, which is not related to the geographical distance to the other regions investigated in our study. Besides, Sidama and wild enset populations showed higher differentiation from one another. Most probably accessions from the regions are not currently breeding with one another and there is no sharing of planting materials. Concerning cultivated enset, accessions from Gurage and South Omo show low connectivity (Fig. 3, 4). This might be due to a distinct genetic profile within Gurage and South Omo accessions and possibly no frequent exchange of accessions between the two regions. Our SNPs data indicate that the cultivated and wild enset accessions are very divergent. Besides, the principal component and phylogenetic tree analysis grouped the 236 enset accessions into four major clusters, where the wild individuals clustered separately. Other enset diversity research has also reported a high level of genetic differentiation between cultivated and wild enset accessions (Birmeta et al. 2004; Gerura et al. 2019; Olango et al. 2015; Tobiaw and Bekele 2011). Also, geographic form of genetic structure was observed with consistent distinct grouping of cultivated enset accessions from Sidama, Gurage and South Omo. This knowledge of population structure and genetic diversity between cultivated and wild enset accessions is crucial for future research and breeding for new introductions.

Genetic diversity within and across populations

The large regional variation in agroecological conditions, different cultures and management relatively large geographic distances between the different enset growing regions within the country should result in large genetic diversity among regions. However, multiple lines of evidence show that the level of genetic diversity among regions (geographical areas) is low. For instance, AFLP analysis of 192 enset accessions from six growing regions showed a limited proportion of diversity among growing regions (11–13%), but a considerable diversity within regions (87–89%) (Tesfamicael et al. 2020). Earlier research also found limited diversity among growing regions compared to within regions, i.e., 13% using AFLPs (Tesfamicael et al. 2020), 4.8% using AFLPs (Negash et al. 2002) and 16% using SSR (Olango et al. 2015).These values indicated that the high proportion of genetic diversity within regions is a general feature of the enset species.

In the current study, the low genetic structuring among regions of enset that were observed both by the average pairwise FST values and AMOVA indicate that allele sharing between regions is high. The AMOVA analysis showed that the level of genetic diversity among regions is limited (7.6%) and very high within regions (92.4%) (Table 2). This is also evident from the low FST values observed between the cultivated enset accessions from the different growing regions (Table 4). These results show that genetic diversity in enset accessions are less affected by the region of origin (Schaal et al. 1998), but has rather been shaped by a long history of extensive human exchange of clonal materials among regions, and different communities may select different sources of the germplasm to suit their specific cultural needs (Gerura et al. 2019; Getachew et al. 2014; Negash et al. 2002). Furthermore, there has also most probably been extensive exchanges of clones particularly between highland and lowland regions because farmers in the latter area believe that suckers imported from the mountain areas grow better than those raised locally (Tesfaye and Lüdders 2003). Because of the large genetic diversity among accessions within regions, clonal selection based on desirable traits may be effective for most of the natural populations in Ethiopia. In addition, the large genetic diversity within regions may be partly explained by gene flow and common origin of the populations. According to some investigations, large genetic diversity within populations is not necessarily caused by environmental heterogeneity, but could be due to historical patterns of relationship (Schaal et al. 1998).

In the present study, 3,505 SNPs markers which were polymorphic among 236 (226 cultivated and 10 wild) enset accessions were detected. This number of SNPs might be considered low relative to the 5,011 SNPs detected from 141 (120 cultivated and 21 wild enset) studied by (Tesfamicael et al. 2020). Moreover, the observed heterozygosity (HO) and expected heterozygosity (HE) is low (Table 3) compared to research using other DNA marker systems such as ISSR (Getachew et al. 2014) and SSRs (Gerura et al. 2019; Olango et al. 2015), but higher than with AFLP markers, which revealed lower observed and expected heterozygosity in cultivated and in wild enset populations (Tesfamicael et al. 2020). However, it is difficult to make direct comparisons between previous and the present study, due to differences in the number and types of the studied enset accessions and different SNP calling and filtering parameters applied. The reasons for the relatively low number of SNP markers detected in the present study could be frequent vegetative propagation and sharing of clones among farmers, which will reduce polymorphism. SNPs were filtered across cultivated and wild accessions; thus, the total number depends on sites that are polymorphic in the cultivated enset. If additional wild enset accessions or cultivated enset from other enset growing regions had been included in the study, the number of SNPs would probably have been higher. To rule out technical artifacts in SNP calling, we tried both the STACKS and TASSEL GBS methods for SNP calling and both derived a low number of SNPs. Also different molecular markers have different properties and will scan different regions of the genome Karp et al. (1997).

In this context, it is interesting that wild enset had lower levels of heterozygosity than expected, indicating that wild enset is a sexually propagated plant within a restricted area, which will limit gene flow and lead to inbreeding and increased homozygosity (Table 3) (Birmeta et al. 2004; Shigeta 1992). Moreover, the suitable habitats for wild enset has been sharply declining in Ethiopia because of population growth and deforestation, and the geographical range of wild enset is more limited, possibly due to more specific ecological requirements or alternatively loss of habitat (Birmeta et al. 2004; Borrell et al. 2019; Olango et al. 2015). This reduction in effective population size might have contributed to the observed lower heterozygosity in wild enset due to the increased chances of inbreeding. This differs from what has been reported based on SSR markers (Olango et al. 2015).

However, relatively high levels of heterozygosity were observed in all cultivated populations (Table 3), which is consistent with the outcrossing nature of enset during sexual reproduction (Brandt et al. 1997; Olango et al. 2015). Enset might have improved phenotypes through heterosis, so that growers favor heterozygous cultivars in the course of selective propagation practices (Oztolan-Erol et al. 2021). Further, the current levels of enset diversity reflect frozen variation; that is diversity that arose through sexual reproduction in an ancestral population (Chapman et al. 2000). In addition, occasional gene flow from wild enset and possibly from other enset species can occur too (Birmeta et al. 2004). Other possible causes of this type of clonal diversity might be somatic mutations, introduction of new variation from outside of the cultivated populations, and introduction of new landraces from other regions (Shigeta 1990; Tsykun et al. 2017). Another possible cause might be the perennial and highly clonally propagated species that are highly selected for adaptability and productivity under cultivation, and different pollination mechanisms (Birmeta et al. 2004; Negash et al. 2002; Yemataw et al. 2016). According to Shank (1994) considerable clonal diversity is present within enset for characters associated with growth and adaptation.

Above and beyond, the highlands of southern Ethiopia form the geographical center of enset cultivation (Vavilov 1997). According to Harlan (1951), high altitude areas have high concentrations of diverse and unique landraces, and can be designated as microcenters of enset diversity. All such factors in combination or alone have resulted in a high degree of genetic diversity in the presently studied enset accessions. The most important point is that most likely differences in genetic diversity among regions are important for farmers; different accessions contribute to the high diversity that is observed at each site and provides strong evidence for selection by humans. Enset diversity in Ethiopia may thus be extensive but it is not effectively utilized, as the available germplasm is poorly known (Borrell et al. 2019).

Genetic signatures for differential selection between cultivated and wild Enset

Little is known about the genetic makeup and population differentiation between cultivated and wild enset. Knowledge about the genetic adaptation of enset is essential for breeding strategies. A central aim of evolutionary biology is to understand the molecular basis for adaptive differences between populations (Lotterhos and Whitlock 2014). Higher genetic population differentiation for adaptive SNP than neutral SNP is expected if adaptation to local environments is the principal source of genetic differentiation (De Villemereuil and Gaggiotti 2015). FST outlier approaches has been applied to many crops, such as tomato (Sim et al. 2011), perennial ryegrass (Kovi et al. 2015), soybean (Li et al. 2014), European beech (Cuervo-Alarcon et al. 2018), banana (Hinge et al. 2022) and common bean (Papa et al. 2007) for identifying adaptive differentiation. Markers detected in these crops have been mapped to the genomic regions with known QTL/genes related to domestication.

Wild enset propagates by seed under natural condition, while cultivated enset is propagated only vegetatively by local farmers (Borrell et al. 2019; Brandt et al. 1997; Shigeta 1992). Most probably the genetic differences between wild and cultivated enset populations can be attributed to the different reproduction systems (Birmeta et al. 2004; Gerura et al. 2019; Olango et al. 2015; Tesfamicael et al. 2020). Continued vegetative propagation during cultivation can lead to loss of sexual reproduction capacity (Denham et al. 2020), thus flowering, seed development, seed size, numbers of viable seeds per fruit and per infructescence are important traits that differentiate cultivated and wild enset (Borrell et al. 2019; Brandt et al. 1997; Hildebrand 2001).

In the present study, we identified 12 candidate loci putatively under positive selection based on FST values displaying differentiation higher than the 99% limit of the confidence interval (Fig. 5, Table 5). Among them, six loci, i.e., E-2488, E-3078, E-298, E-1617, E-3031 and E-3091, might be under direct selection. SNP annotation showed that the putative functions of these candidate loci (Table 5) are involved in different biological processes, including sexual reproduction and flowering signaling in plants, which are key players in domestication and adaptation (Borrell et al. 2019). E-2488 was identified as a SAUR-like auxin-responsive protein. Small auxin-upregulated RNAs (SAURs) is the largest family of early auxin responsive genes in higher plants regulating a wide range of cellular, physiological, and developmental processes (Ren and Gray 2015; Zhang et al. 2021). Most of the SAUR genes, which are part of auxin response factors (ARF) regulate cell elongation, at least in the seedlings (Sun et al. 2016). Further, Hu et al. (2015) showed higher expression of MaARF genes at initial days of flowering than at later stages, suggests crucial roles of the ARF genes in early banana fruit development. E-3078 was identified as an isoflavone synthase gene (IFS), which plays a natural role in plant defense and root nodulation. Manipulating the expression of IFS in legumes showed improved pathogen and stress responses (Jung et al. 2000). E-298 was detected as a DNA binding with one finger (Dof) protein, which is a plant-specific transcription factor having multiple roles, such as seed maturation and germination (Ruta et al. 2020). Further, Dof proteins are involved in the growth and development of banana reproductive organs (Dong et al. 2016; Venkatesh and Park 2015). E-1617 was identified as a serine/threonine-protein kinase (STK). STKs are involved in various developmental processes like cell proliferation, modification of cell shape and apoptosis. Proteomic research in somatic embryo development in banana, showed that serine /threonine- protein kinase (spot 17) was found to be highly expressed in mature somatic embryos and these proteins are associated with pattern formation and tissue specification during embryonic developmental process (Kumaravel et al. 2020). E-3031 was identified as histone acetyltransferases (HATs), which plays critical roles in the regulation of chromatin structure and gene expression. Genetic analysis and cytological study revealed that the double mutation induced severe defects in the formation of male and female gametophyte, resulting in an arrest of mitotic cell cycle at early stages of gametogenesis (Latrasse et al. 2008), thus showing their crucial roles in cell division. The final SNP, E-3091 was associated with R2R3-MYB transcription factor. These transcription factors have been shown to play regulatory roles during plant development, and responses to biotic and abiotic stress in banana (Pucker et al. 2020). Further, MYB genes MaMYB4, an R2R3-MYB Repressor transcription factor, negatively regulates the biosynthesis of anthocyanin in banana (Deng et al. 2021) and also MaMYB3 is involved in fruit ripening through modulation of starch degradation (Fan et al. 2018). Moreover, two of the genes, serine/threonine-protein kinase and MYB transcription factor identified in our study were also detected recently in a similar study of enset (Tesfamicael et al. 2020).

Conclusion

Our study on enset detected a significant subdivision between cultivated and wild enset and a large molecular diversity within populations, indicating a heterogeneous collection of enset from Ethiopia. Most of the molecular diversity exists within geographical regions and very little between regions. Enset from Sidama and South Omo are more genetically diverse than enset from Gurage. Furthermore, we identified six genes involved in sexual reproduction and flowering signalling being differentially selected between cultivated and wild enset. These novel findings are useful for the conservation of genetic resources, especially under global climate changes, and contribute to the potential discovery of functional genes and genetic mechanisms related to adaptability of enset to local climatic conditions, especially drought. This is encouraging for the potential of diversifying crops also in regions where enset is not traditionally grown, such as the food insecure dry north.