Background

In terms of cultivated area and total grain production, sorghum (Sorghum bicolor) is the fifth-most important cereal in the world. It serves as a staple for millions of people in Africa and Asia (Ejeta and Grenier 2005). Africa has become the leading sorghum producer in recent years with an average annual volume contribution of >25 million tons of grain and the area covered by the crop in this continent is larger than in other continents (FAOSTAT 2010). Ethiopia is the third largest producer of sorghum in Africa behind Nigeria and Sudan with a contribution of about 12% of annual production (Wani et al. 2011) and the second after Sudan in the Common Market for Eastern and Southern Africa (COMESA) member countries (USAID 2010). Next to maize and tef, sorghum is the third-most important cereal in Ethiopia (CSA 2012). In Ethiopia, sorghum covers 16% of the total area allocated to grains (cereals, pulses, and oil crops) and 20% of the area covered by cereals (CSA Central Statistical Agency 2011). In 2012 alone, 5.2 million holders produced 3.9 million tons of sorghum grain on 1.9 million hectares of land. More than 95% of this area was covered by landraces. Sorghum is the second most important crop for injera (common leavened flat bread) next to tef (Adugna 2012). The grain is used for the preparation of traditional foods, distilled and undistilled beverages and the biomass is highly valued for construction, fuel and animal feed. The crop grows almost exclusively during the main rainy season, which in most regions extends from March to November/December.

Ethiopia serves as the global reservoir for sources of favorable genes of various crops to which it is the Vavilonian center of origin and diversity including sorghum [Sorghum bicolor (L.) Moench]. Ethiopian farmers grow mixed sorghum landraces of diverse forms in their fields for various local purposes. The Ethiopian sorghum germplasm has been highly contributing to the global agriculture. Singh and Axtell (1973) identified two high-lysine Ethiopian sorghum lines, IS11167 and IS11758. IS 12662C (SC 171), the source of A2 cytoplasm (the sterile line) for the development of hybrids, which belongs to the Caudatum Nigricans group (Guinea race) was also obtained from Ethiopia (Schertz 1977). Moreover, studies identified two sorghum lines native to Ethiopia (B35 and E36-1) as sources of “stay-green” for drought tolerance, which are currently used in marker assisted breeding programs (Rosenow et al. 1983; Reddy et al. 2009). Wu et al. (2006) identified seven sorghum lines of Ethiopian origin to be resistant to Greenbug biotype I. These are: ETS2140(PI452752), ETS3447(PI455203), ETS3805(PI455812), ETS4159(PI456490), ETS4167(PI456504), ETS4565(PI457212), ETS4614-B(PI457314). Another example is E 35–1, a selection from the Ethiopian zera-zera sorghum landrace, which has now been introduced for direct cultivation and in the breeding programmes in many countries (IBC Institute of Biodiversity Conservation 2007). Moreover, some superior varieties of Ethiopian origin were released in India, Eritrea, Burkina Faso, Zambia, Burundi and Tanzania (Reddy et al. 2006) showing their contribution to the economy of these countries. Being the center of origin and diversity for sorghum, therefore, Ethiopia may harbor unique germplasm that is worthy of crop improvement and conservation.

The significance of studying the genetic diversity of plants is explained elsewhere (e.g., Mutegi et al. 2011). Over the years, a number of studies have been dealt with estimating genetic diversity in cultivated sorghum using phenotypic traits (e.g., Zongo et al. 1993; Appa et al. 1996; Ayana and Bekele 1998; Ayana et al. 2000; Dahlberg et al. 2002; Shehzad et al. 2009), Allozymes (Aldrich et al. 1992; Ayana et al. 2001), RAPDs (Menkir et al. 1997; Ayana et al. 2000; Agrama and Tuinstra 2003; Nkongolo and Nsapato 2003; Uptmoor et al. 2003), RFLPs (Cui et al. 1995; Yang et al. 1996; Jordan et al. 1998), ISSRs (Yang et al. 1996); AFLPs (Uptmoor et al. 2003; Menz et al. 2004; Geleta et al. 2006; Ritter et al. 2007), Genomic SSR markers (Dean et al. 1999; Ghebru et al. 2002; Agrama and Tuinstra 2003; Menz et al. 2004; Casa et al. 2005; Geleta et al. 2006; Barnaud et al. 2007; Deu et al. 2008; Wang et al. 2009; Mutegi et al. 2011; Cuevas and Prom 2013); EST-SSR markers (Ramu et al. 2013); Diversity Arrays Technology (DArT™) (Mace et al. 2008) and SNP markers (Wang et al. 2013; Morris et al. 2013). Some of these studies were based on global and local accessions from gene banks, while others were based on field collections and most of them reported moderate to high diversity. It should be noted that each of these markers has its own advantages and limitations. Moreover, some studies (e.g., Labeyrie et al. 2014) dealt with the influence of ethnolinguistic and cultural diversity on the patterns of genetic structure of sorghum populations.

Phenotypic traits may not give reliable estimate of genetic diversity as these traits are limited in number and due to environmental influence (van Beuningen and Busch 1997). On the contrary, molecular diversity data can potentially bridge conservation and use when employed as a tool for mining germplasm collections for genomic regions associated with adaptive or agronomically important traits (Casa et al. 2005). Simple sequence repeat (SSR) markers are among the markers of choice currently being used for population genetic studies due to their high polymorphism even between closely related individuals within a species (Edwards et al. 1996), transferable between populations (Taramino and Tingey 1996; Gupta et al. 1999), require small amount of DNA, high reproducibility, codominance, abundance, and fairly evenly distributed throughout the euchromatic region of the genome (e.g., Schlotterer 2004). Information on in situ diversity and genetic structure of cultivated sorghum using reliable marker systems such as SSRs while indispensible is lacking in the center of origin, Ethiopia. Thus, this study was designed to fill up this gap. Therefore, this study aimed at 1) investigating the genetic diversity of sorghum landraces sampled in situ from three agroclimatic regions of Ethiopia using phenotypic traits and SSR markers; 2) investigating the factors shaping the population genetic structure of sorghum landraces; and 3) suggesting measures to aid efforts of crop improvement and genetic resources conservation.

Materials and methods

Study areas and plant samples

The geographical characteristics of the sample collection sites are presented in Table 1. One-hundred sixty plant samples of cultivated sorghum were collected from eight populations in three diverse geographical and agro-climatic regions of Ethiopia in October and November in 2009 to study in situ genetic diversity and population structure. Four of the eight populations were collected in Wello in Amhara regional state (one from south and 3 from north administrative zones), two in Gibe river valley (Oromia regional state), and two in Metekel zone (in Benishangul-Gumuz regional state). The different landrace collections are known by different local names (Table 1). These regions were selected based on four vital reasons: 1) the sites were selected to include a broad swath of the range of sorghum cultivation in Ethiopia; 2) Sorghum is the dominant crop in these regions; 3) Wello region has been under recurrent drought and improved varieties have been introduced into the region as food and seed aid thereby inflicting risk of displacement of the landraces; 4) Metekel and Gibe are high rainfall (>1000 mm) and fertile settlement areas, mainly for people from Wello region due to which sorghum landraces might be displaced by landraces from other regions (e.g., Wello) and other crop species. On the other hand the Wello regions of collection are characterized by low annual rainfall (600-700 mm). All of the regions of collection have high temperatures. In all the regions of collection, long-cycle sorghum landraces are traditionally sown in March/April for the main rainy season and harvested in November/December. Each site of collection was considered to represent a population, from which 20 plants were sampled. Each population was a mixture of different landraces collected from three to five farmers’ fields. The names of the dominant landraces based on farmers’ assignment in each site are presented in Table 1. Readings of the coordinates and altitudes of the collection sites were recorded by a GPSmap 60CSx Global Positioning System (GPS) (Garmin), which was later overlaid on to the regional map of Ethiopia using ArcGIS version 9.3 (Figure 1).

Table 1 Geographical characteristics of the sorghum collection sites and names of the dominant cultivars
Figure 1
figure 1

Map of Ethiopia showing sorghum landrace collection sites. Key of abbreviations of the regional states: AA = Addis Ababa, AF = Afar, AM = Amhara, BG = Benishangul-Gumuz, DD = Dire Dawa GA = Gambella, HA = Harari, OR = Oromiya, SN = Southern Nations, SO = Somali, and TI = Tigray.

DNA extraction, PCR amplification and genotyping

Leaf squashes were collected in situ from mature plants using Whatman® FTA® plant saver card. Extraction and purification of DNA samples were performed using a two-step protocol developed by the manufacturer and optimized for sorghum by Adugna et al. (2011). DNA extraction and the subsequent molecular marker analysis were carried out at Stanley J. Aronoff Laboratory, Ohio State University, Columbus, Ohio. PCR were run using 12 sorghum microsatellite loci that were previously mapped (Brown et al. 1996; Taramino et al. 1997; Bhattramakki et al. 2000; Li et al. 2009) and represented all of the 10 sorghum linkage groups (Additional file 1: Table S1). These loci were selected based on their high polymorphism and compatibility for multiplexing. PCR followed the QIAGEN® multi-master mix kit protocol for SSR multiplex, and forward primers were labeled with different fluorescent dyes: FAM (6-carboxyfluorescein), HEX (hexachloro-6-carboxyfluorescein), or NED (2, 7, 8′-benzo-5′-fluoro-2′, 4,7-trichloro-5-carboxyfluorescein) (PE-Applied Biosystems, Foster City, CA). PCR was carried out in 12 μl total volume of reaction mix containing 1 μM of each primer pair in a multiplex, 1 μl of template DNA, 2.6 μl of sterile ddH2O, 6 μl of QIAGEN® Multiplex PCR 2X Master mix. Polymerase chain reactions were run in a Master cycler (Eppendorf™) with an initial denaturation step of 15 min at 95°C, followed by 35 cycles of 30 sec at 94°C, 90 sec at 58°C, 60 sec at 72°C, 30 minutes at 60°C, and held at 4°C following QIAGEN® protocol for microsatellite multiplexes.

To determine SSR fragment sizes, 2 μl of the PCR product was diluted with 14 μl of ddH2O and then 2 μl of the diluted PCR product was added to 14 μl of 36:1 Hi-Di-Formamide: GenScan™/350 Rox™ size standard in a 96 well microtiter plate and was denatured at 95°C for 5 minutes and cooled on ice for at least 5 minutes. Allele size scoring of the PCR fragments was done by ABI 3100 Genetic Analyzer (DNA sequencer) and sizes were read using the associated GeneMapper 3.7 software (Applied Biosystems Inc., CA, USA) and manually scored. To exclude the possible effects of imprecise DNA fragment sizes due to stuttering, large allele drop out, or null alleles on genotyping, the software Allelobin (Prasanth et al. 2006) was used to classify observed SSR allele sizes into representative discrete allele sizes using a variation of the least-square minimization algorithm of Idury and Cardon (1997).

Data recording and statistical analysis

Quantitative phenotypic measurements

To estimate phenotypic diversity, data were measured from 160 cultivated S. bicolor individual plants (20 plants per site, which represents a population) in the field on seven common quantitative phenotypic traits following descriptors for sorghum (IBPGR/ICRISAT 1993). The measured traits were: head length (HDL) and width (HDW) (cm), flag leaf length (LL) and width (LW) (cm), total leaf number (LN) on main stalk, plant height (PLHT) (cm), and number of tillers (TIL). The quantitative phenotypic data were scaled to fit a normal distribution and subjected to simple descriptive statistics. Pearson’s coefficients of correlations were computed between all pairs of traits and their significance was tested using a t-test. Values from the correlation matrix were used to perform PCA using GenStat software.

SSR polymorphism and analysis of genetic diversity

To estimate the discriminatory power of the SSR loci, polymorphism information content (PIC) (Botstein et al. 1980; Anderson et al. 1993) was computed using PowerMarker software V3.25 (Liu and Muse 2005). The number and frequency of SSR alleles was also computed using the same software. GENEPOP 4.0 (Rousset 2008) was used to compute exact tests for Hardy-Weinberg equilibrium and for genotypic disequilibrium among pairs of loci. This was also complemented with the HW-QuickCheck computer program (Kalinowski 2006). Nei’s heterozygosity estimates (Ho, Hs, and Ht) were computed using FSTAT software (Goudet 2002). Allelic richness (Rs) and private allelic richness (Rp) were computed using the rarefaction method (Hulbert 1971) implemented in HP-Rare 1.1 software (Kalinowski 2005). Significance of differences in the overall gene diversity, allelic richness and private allelic richness between populations and among the regions of collection was tested using a nonparametric Wilcoxon signed ranks test (Wilcoxon 1945) implemented in SPSS Statistics software release 17.

Population structure and gene flow

To estimate the components of variance among regions of collection, and among and within populations, analysis of molecular variance (AMOVA) was computed using Arlequin v 3.1 software (Excoffier et al. 2005). To investigate population differentiation, Wright (1951) fixation index (FST) of the total populations and pair wise FST among all-pairs of populations were computed using FSTAT software (Goudet 2002) and significance was tested based on 10000 bootstraps. Gene flow was estimated using indirect method based on the number of migrants per generation (Nm) as (1-FST)/4FST. Shared alleles distance matrix (Jin and Chakraborty 1994) was used to construct Neighbor-joining dendrogram for the 160 samples belonging to the eight populations using PowerMarker (Liu and Muse 2005), and the resulting tree was viewed using TreeView 1.6.6 (Page 2001, available at http://taxonomy.zoology.gla.ac.uk/rod/rod.html). Further, the pattern of population structure and detection of admixture was visualized using a Bayesian model based clustering method implemented in STRUCTURE software, Version 2.2 (Pritchard et al. 2000). For this, two separate analyses were run with and without prior information about the populations. The first was done by assigning the site of collection as the putative population origin for each individual and the second run was without giving such information and letting the STRUCTURE program assign each individual into a population. The admixture model with correlated allele frequencies was used as suggested in the manual. A burn-in period of 10,000 was used followed by 10,000 Markov Chain Monte Carlo (MCMC) replications for data collection for K = 1 to K = 8 groups. For each K value, 10 replicates were run. This procedure clusters individuals into populations and estimates the proportion of membership in each population for each individual (Falush et al. 2003). The optimum number of clusters was predicted between K = 1 and K = 8 following the simulation method of Evanno et al. (2005) using the web based software STRUCTURE HARVESTER v0.6.92 (Earl and von Holdt 2012).

To study the pattern of gene flow, Slatkin’s FST matrix was first converted into Rousset (1997) genetic distance as FST/(1-FST) matrix. The geographic distance among the collection sites was computed from geographical coordinates marked with the aid of GPS using the web based Geographic Distance Matrix Generator (GDMG) version 1.2.3 software of the American Museum of Natural History, Center for Biodiversity and Conservation (http://biodiversityinformatics.amnh.org/open_source/gdmg/index.php). Later, the correlation between Rousset’s genetic distance matrix and the geographic distance matrix of the collection sites was calculated using the web based program IBDWS version 3.23 (available at http://ibdws.sdsu.edu/~ibdws/). Significance of the correlation was tested using Mantel (1967) test. Moreover, analysis of reduced major axis (RMA) regression (Hellberg 1994) was done to calculate intercept and slope of genetic and geographic distance matrices for inference of isolation by distance.

Results

Diversity of quantitative phenotypic traits

Considerable variation was observed among the populations for the measured quantitative phenotypic traits. The number of tillers was in the range of zero to five. Head length was as small as 11 cm and as large as 46 cm in some cultivars and head width was in the range of five to 40 cm with mean 12.1 cm. Plant height was in the range of 147 cm (in an improved lowland variety, 76T1#23, coded Wello-2) to 470 cm (Metekel-1) averaging 289 cm. Six of the eight populations showed an average height of greater than 3 m. Leaf width ranged from 4.4 cm (Wello-2) to 12.5 cm (Metekel-1). Leaf length was also in the range of 42 cm to 100 cm and leaf number was in the range of six (Wello-2) to 24 (Wello-4) (Table 2).

Table 2 Simple descriptive statistics and principal component factor loadings of the measured quantitative phenotypic traits

Correlation was significant in all pairs of characters (p < 0.05), except between number of tillers and head length and width, and leaf length; between head length and leaf number, and between leaf length and leaf number. Plant height and leaf width had highly significant positive correlation with the remaining quantitative phenotypic traits. The first three principal component axes explained 80.53% of the total variation. Plant height contributed the largest factor loadings (0.44) for PCA1. PCA2 is mostly influenced by number of tillers per plant (0.56). For PCA3 leaf length contributed the largest share of the variation (0.40). Figure 2 shows the pattern of phenotypic diversity in the 160 plant samples from the eight populations based on the first two principal components. Four groups/ clusters are clearly observed in this biplot. Cluster I consisted of W2 populations. Cluster II composed of individuals from Wello populations, W1 and W4. Cluster III was dominated by Gibe populations, G1 and G2 with some individuals from M1 population with similar phenotypes. Cluster IV was mainly composed of individuals from W3. Metekel populations, M1 and M2 had individuals represented in all of the clusters.

Figure 2
figure 2

PCA biplot showing the distribution of the 160 sorghum samples based on their measured phenotypes.

SSR polymorphism

Availability of alleles in each locus (the proportion of loci without missing alleles) ranged from 0.93 (Sb4-121) to 1.0 (Sb5-206, Sb1-1, Sb6-34, and Sb4-72) with mean 0.97. All of the 12 SSR loci were highly polymorphic with PIC values ranging from 0.38 to 0.85 (mean = 0.62) (Table 3). All except two SSR loci had PIC values greater than or equal to 0.5. They produced a total of 123 alleles of which 78 (63.4%) were rare (with frequency ≤ 0.05). The number of alleles produced per polymorphic locus ranged from 3 to 27 with an average of 10.25. The effective number of alleles was also in the range of 1.7 to 7.5 (Table 3). The frequency of the major allele was in the range 0.24-0.75 with a mean of 0.47. A comparison of SSR size ranges from the previously published reports and observed in the present study is presented in Additional file 1: Table S1. Tests for Hardy–Weinberg equilibrium (HWE) for all loci and all populations revealed that they did not significantly deviate from HWE.

Table 3 Diversity indices of the SSR loci used in the study (N A = observed number of alleles; A e = effective number of alleles; R s = Allelic richness; H o = average observed heterozygosity; H e = Expected heterozygosity/gene diversity; PIC = polymorphism information content;

Genetic diversity

The values of the various genetic diversity indices of the eight populations are presented in Table 4. Average observed heterozygosity (Ho) was in the range of 0.05-0.32 with mean 0.13 across all loci. Gene diversity was the lowest in Gibe-1 (He = 0.20) population and the highest in Wello-2 (He = 0.70) population and its value averaged over all populations and loci was 0.67 (SD = 0.11). W2 population had also the highest allelic richness. Allelic richness and private allelic richness over all pairs of populations and loci were significant (p < 0.05). M1 population had the highest (Rp = 0.83) and G1 had the lowest private allelic richness (Rp = 0.11). Wello as a region of collection supported the highest gene diversity (He = 0.70) whereas Gibe was the lowest (He = 0.40), but values were significant between Gibe and Wello (Z = −3.06, P = 0.002), and between Metekel and Wello (Z = −2.35, p = 0.01). Allelic richness was the lowest in Gibe (Rs = 3.9) and the highest in Wello (Rs = 6.8), but differences were significant between Gibe and Metekel (Z = −2.13, p = 0.03) and between Gibe and Wello (Z = −2.82, p = 0.006), but not significant between Metekel and Wello (Z = −1.78, p = 0.08). Similarly, private allelic richness was significant between Gibe and Metelkel (Z = −2.5, p = 0.01) and between Gibe and Wello (Z = −2.67, p = 0.008), but not significant between Metekel and Wello (Z = −0.71, p = 0.48).

Table 4 Summary of the population diversity indices averaged over the 12 loci (N A = number of alleles per polymorphic locus, A p = number of private alleles, R s = allelic richness, H o = average observed heterozygosity, H e = gene diversity)

Population genetic structure and gene flow

AMOVA showed 54.44% of the variation to be within populations, 32.76% among populations within regions, and 12.8% among the regions of origin (FST = 0.40, p < 0.001) (Table 5). Pair wise FST values among all populations were significant (p < 0.001) (Table 6). The divergence among the regions of collection was also high (FST = 0.21, p = 0.02). The Neighbor-joining dendrogram grouped the 160 individuals of the eight populations into three major clusters (Figure 3). Accordingly, Cluster I consisted of individuals from the improved early maturing variety, 76 T1#23 (Wello-2 population). Cluster II joined the two populations from Metekel (Metekel-1 and Metekel-2), a population from Wello (Wello-1) and Gibe-1 population. The third cluster (cluster III) consisted of individuals from the two adjacent Wello populations (Wello-3 and Wello-4) and Gibe-2 population. This pattern of clustering was also similar to the principal component biplot (Figure 4). Evanno et al. (2005) method on STRUCTURE outputs predicted K = 2 to be the most likely number of clusters (Figure 5). STRUCTURE with and without prior information on the populations gave similar clustering (K = 2). With no prior information, 73 (46%) of the total 160 individual plants were grouped in cluster I with ≥0.90 probability of membership whereas 71 (44%) of them were grouped in cluster II with the same probability of membership. Assigning the site of collection as the putative population origin for each individual (with prior information) resulted in exactly the same result as above (Figure 6). In such a case, both clusters contained 6 to 20 members of five populations each with ≥0.90 coefficient of ancestry. All of the 20 (100%) individuals of each of Metekel-1 and Gibe-1 populations, and 17 (85%) of individuals of Metekel-2 population were grouped in Cluster I (Additional file 2: Table S2). The number of migrants per generation as an indirect estimate of gene flow was very low (Nm = 0.38) in the overall populations. However, gene flow as high as Nm = 3.66 was observed in the adjacent Metekel populations (M1 and M2).

Table 5 Analysis of molecular variance (AMOVA) among the sorghum regions of collection, among the populations within geographical regions, and within sorghum landrace populations
Table 6 Pair wise F ST matrix, a measure of population divergence among the sorghum landrace populations (all pairs were significant, p < 0.001)
Figure 3
figure 3

Neighbor-joining radial tree showing the clustering pattern of individual samples from the eight sorghum landrace populations.

Figure 4
figure 4

Principal component (PCA) biplot of the 160 sorghum samples based on correlation of SSR allele frequencies.

Figure 5
figure 5

A biplot detected the maximum peak at K = 2 (the optimum number of clusters) based on Evanno et al. ( 2005 ) prediction.

Figure 6
figure 6

STRUCTURE bar graphs of the 160 individual sorghum plant samples in eight pre-determined populations (x-axis) at K = 2. Figures in the y-axis show coefficient of membership/assignment.

Mantel test for the correlation between Rousset’s genetic distance and the geographic distance matrices was weak, but significant (r = 0.272, p = 0.020). Moreover, the reduced major axis (RMA) regression showed a significant relationship with an intercept (−0.2936 ± SE0.2290, 1000 bootstraps over individual pairs), slope (0.003 ± SE0.0006) and with coefficient of determination (R2 = 0.074) (Figure 7).

Figure 7
figure 7

RMA regression of Rousset’s genetic distance matrix plotted against the geographic distance (Km) matrix of the sorghum landrace collection sites in Ethiopia ( Y  = − 0.2936 X  + 0.00343 ; R2= 0.074; p < 0.001).

Discussion

Diversity of quantitative phenotypic traits

It is well known that the majority of the Ethiopian sorghum landraces are characterized by high biomass (tall height, large leaf area and large number of leaves). All of the populations included in the present study displayed such characters except the only improved exotic variety, 76 T1#23 which showed parameters deviated from such measurements. For instance, all except Wello-2 and Wello-3 populations exhibited average height between 300 and 360 cm. Wello-3 (Jigurte) population is relatively shorter and earlier maturing than the high yielding and previously the dominant cultivar called Degalit (Wello-4). Jigurte is now becoming the dominant cultivar in Kobo-Alamata plain due to its earlier maturity than the other landraces and its better suitability to the changing climate, mainly to unreliable rainfall in the region. Some farmers call it as “America” perhaps because it was introduced from another place decades ago. The observed high variation in the range of the quantitative phenotypic traits in all populations could have genetic basis or it could be due to phenotypic plasticity. If it is due to the latter, it could be due to the differences in the rainfall and temperature as there was little variation in altitude of the collection sites (1088-1500 m) to bring about such changes. Ayana et al. (2000) studied the geographical pattern of quantitative phenotypic traits in Ethiopian and Eritrean sorghum gene bank accessions. They found that the variation within and among geographical regions was high and they suggested that gradients of growing period, rainfall and temperature are more important for such variations and should be considered during future germplasm collection.

SSR polymorphism and genetic diversity

The observed SSR fragment sizes were within the range of the sizes in the previously published reports in sorghum (Brown et al. 1996; Dean et al. 1999; Ghebru et al. 2002; Agrama and Tuinstra 2003; Abu-Assar et al. 2005). These set of primer pairs are highly polymorphic and are being used for genetic finger printing as well as marker assisted breeding programs.

Although there was no comparison of the present in situ germplasm set with historical gene bank accessions, the high genetic diversity observed in Wello populations compared to the populations from other regions may indicate that the sorghum genetic diversity in this region is still in a good situation. This may show that farmers even during harsh drought seasons can conserve their landraces. However, this does not show the changes in the historical genetic diversity in the region. The highest diversity in Wello-2 population representing an exotic improved variety (76 T1#23) was rather unexpected. This variety was released in 1976 and its distribution to farmers all over the country has also long history. Thus, it might be because the variety was mixed with the landraces and lost its genetic purity. As a region of collection, Gibe populations were found to have the lowest diversity and Wello populations the highest in terms of allelic richness and number of private alleles. The significantly lower genetic diversity indices of the Gibe and Metekel populations than those of the Wello populations may indicate some level of genetic drift (but SSR analysis did not confirm this) during sampling of the seeds by migrants during settlement. Farmers usually carry few heads when they migrate and these may represent few genotypes only. Gibe and Metekel areas had no history of sorghum production before settlement.

The extent of the gene diversity of the studied Ethiopian sorghum populations (He = 0.66) was similar to Kenyan sorghum accessions (He = 0.66) (Ngugi and Onyango, 2012), Niger sorghum (He = 0.613) (Deu et al. (2008) and Eritrea sorghum (He = 0.65) (Ghebru et al. 2002), but higher than Morocco sorghum (He = 0.29) (Djé et al. 2000) and (He = 0.32) (Barnaud et al. 2007) using similar SSR markers. However, comparisons were not fair as the number of samples and the sampling strategy were different. Similarly, the observed heterozygosity (Ho = 0.13) was comparable to what was observed in Djé et al. (2000) (Ho = 0.134) and a bit higher than in Barnaud et al. (2007) (Ho = 0.11), but much higher than the result of Deu et al. (2008) (0.042). Although there was no significant departure from HWE, the observed heterozygosity was much lower than the expected hetrozygosity/gene diversity. In congruence with this study, Nybom (2004) compiled 79 microsatellite based studies and found that grand means for Ho was lower than He in 64 of these studies. Similarly, most of the genetic diversity studies in sorghum using SSRs (e.g., Ghebru et al. 2002; Barro-Kondombo et al. 2010; Deu et al. 2010; Ngugi and Onyango 2012) supported this finding.

Cuevas and Prom (2013) studied the genetic diversity and population structure of 137 sorghum accessions of Ethiopian origin preserved at USDA-ARS National Plant Germplasm System (NPGS) using 20 SSRs and found observed and expected heterozygosity of 0.23 and 0.78, respectively. These figures are higher than our findings. Even though ex situ accessions can sometimes experience loss of variability associated with missing of low frequency alleles (<1%) during repeated regeneration (e.g., Wilkes, 1989; Adugna et al. 2013), the major difference in the diversity of the present study and Cuevas and Prom (2013) was perhaps in the sampling strategy including the sampling area and period. Sorghum grows almost everywhere in Ethiopia between altitude range of 500 and 2400 m. However, our sampling mainly focused only on three geographical regions and individual plant sample collections were done in 3–5 farmers’ fields. Cuevas and Prom (2013) mentioned that most of the accessions they used had no passport data and thus there is a possibility that they could be countrywide collections. It is possible that including more locations may increase the chance of getting higher diversity.

Population structure and gene flow

Some populations of landraces with different folk names like Degalit and Jigurte from Wello, which are morphologically different, were not found to be distinct using SSRs. This could be attributed to different reasons. First, they may not be genetically distinct from each other in which case the morphological differences between these cultivars may have little genetic basis; instead it could be due to farmers’ directional selection for different morphological traits for different purposes. Another reason could be that the observed morphological differences might not be detected using neutral genetic markers. Similar observations were made in Mali, Guinea and Kenya that varieties defined as different, based on their vernacular/ folk names or collection sites were in fact very closely related using SSR markers (Sagnard et al. 2011; Labeyrie et al. 2014). Collections were made in Metekel in settlement villages and Gibe valley composed mainly of people from Wello. Therefore, as expected, clustering together of Metekel-1 and Metekel-2 populations with Wello-1 and Gibe-1 populations might be due to long distance seed movement with settlers. Surprisingly, Gibe populations displayed more affiliation with Wello and Metekel populations than within themselves. High gene flow was also observed between Wello-1 and Metekel-2 (Nm = 2.25) and between Wello-1 and Metekel-1 (Nm = 1.14) populations as that of the gene flow between the adjacent Metekel-1 and Metekel-2 populations (Nm = 3.66). However, gene flow in the overall populations was very low, which was contradictory to the ex situ accessions of Ethiopian origin conserved at USDA-ARS (Cuevas and Prom 2013).

Mantel test for the correlation between Rousset’s genetic distance and the geographic distance matrices shows only the significance of the relationship; hence, slope and intercept of this relationship should be done using regression techniques (Bohonak 2002). Among the regression techniques, reduced major axis (RMA) is reportedly better for analysis of isolation by distance (Hellberg 1994). Hence, computation of intercept and slope of genetic distance of the sorghum landraces and the geographic distance of the collection sites using RMA regression resulted in a weak, but significant relationship with slope (0.003 ± SE0.0006). This shows that gene flow among populations follows a trend of isolation by distance (IBD) in a two dimensional stepping stone model, which indicates that the farthest the populations are located the weakest are their relationships. However, long distance seed movement as it has already happened from Wello area to Metekel by settlers could be the major force that played a major role in shaping the genetic structure of the landrace populations. Thus, we believe that the pattern of the population genetic structure of the studied landraces was strongly influenced by human migration with evidence from Figure 3. It is known that this pattern of genetic structure still observed today is the result of the history of domestication and human migrations (Sagnard et al. 2011).

Implications for crop improvement and genetic resources conservation

The importance of crop diversity to counteract genetic vulnerability and how plant breeding, plant variety legislation, and an expanding seed industry may influence genetic diversity is well discussed elsewhere (e.g., Brown 1983). It has been argued that due to recurrent drought occurring in some of the major sorghum growing regions of the country, the diversity of the crop is declining over time and farmers in the dry lowlands tend to use high yielding improved early maturing sorghum varieties or shift their production systems to more vulnerable and low yielding early maturing crop species such as tef (Eragrostis tef), the dominant cereal, which might have resulted in genetic erosion of the sorghum landraces in these regions. The adoption of early maturing improved varieties was also found to be high in two of such areas affected by recurrent drought, Kobo in the North Eastern and Mieso in the Eastern parts of the country (Bekele et al. 2013). However, inadequate supply of seeds and lack of promotion impede the improved varieties to spread further in other regions of the country. Seed supply in the later regions is inadequate partly because the farmers decide to plant the seed of improved sorghum varieties late in the season only when the seed of their landraces fail to emerge. At this time, there is no possibility of getting improved seed except some kilos from the research institutes for some farmers for testing. During planting the improved seed, they usually do not remove the remnant plants of their landraces from the previous planting. As a result, the harvest from such fields does not ensure quality seed for the farmers for the next season planting. Due to this reason and lack of isolation from the neighboring fields, the improved seeds could not usually be used for more than two cycles. Another scenario for the lack of widespread adoption of improved varieties in the majority of the regions is the subsistence nature of poor sorghum farmers’ lives. They cannot afford to buy seeds of improved varieties. Because farmers harvest only from improved varieties, which are usually planted in small plots of land, no matter how much they love them, they consume what they have harvested. Thus, they will not get seed for the coming season and the problem persists. Unlike the improved seeds, seeds of the landraces can be shared freely, exchanged in kind or purchased from market at any time of the year.

In some areas like the extreme North Wello (where we collected Wello-2, 3, and 4 populations) once up on a time sorghum was the dominant crop and highly diverse. At present, however, tef is the dominant crop species in this area and wherever sorghum is growing, only few representative cultivars, which could go with the changing climate, are dominating. Shewayrga et al. (2012) proved loss of diversity in sorghum landraces in this region of Wello through comparing historical accessions preserved in gene banks for 30 years (originally collected in 1973) with in situ collections (newly collected in 2003).

The current Ethiopian sorghum germplasm holdings at the Ethiopian Institute Biodiversity (EIB) reached 9432 (http://www.ibc.gov.et/biodiversity/conservation/database-ms). This number is very small compared to the germplasm preserved elsewhere. For instance, more than 7000 germplasm accessions of Ethiopian origin are preserved at the US National Plant Germplasm System (Erpelding and Prom 2009) and another 4500 at ICRISAT genebank (IBC Institute of Biodiversity Conservation 2007). Moreover, even though the Ethiopian germplasm has been serving the world as source of valuable genes or for direct cultivation, the Ethiopian research system has not yet fully utilized these resources. As a result, there has been little success in breeding farmers’ preferred sorghum varieties in Ethiopia due to the mismatch between farmers’ preference and the breeders’ criteria for selection. Over the past 4 decades more than 40 varieties have been released for the different agroecologies except for the wet lowlands of Ethiopia including Metekel zone, which was covered by this study. However, none of the released varieties has been able to widely taken up by the farmers. Because the wet lowlands combine high moisture (humidity) and high temperature, they are suitable for the development of various fungal leaf and head diseases those attack sorghum. The breeding program has been almost exclusively dependent on introduced germplasm, which are short in height and early in maturity and little attention has been given to the landraces. However, there are landraces well adapted to the various sorghum growing environments due to co-evolution with the changing climate, insect pests, striga (the parasitic weed), and pathogens of the common diseases. Of course some of these landraces have limitations of poor grain quality and extended maturity of as long as nine months. On the other hand, better quality sorghum landraces are also found. Therefore, future sorghum improvement should focus on improving the landraces. For instance in the present study, crossing of the distinct Wello populations (Wello-3 and Wello-4) with some of the remaining populations included in this study may result in good combination for selection of progenies with desirable characteristics to be used as varieties in the wet lowlands as they are genetically distant from one another. Moreover, reintroduction of ex situ germplasm to their original places of collection which are now dispossessing the diversity may help to revitalize the lost diversity.

Author’s information

AA has been a senior sorghum and millets breeder in the Ethiopian Institute of Agricultural Research (EIAR) for over 15 years. Currently, he is working for Advanta Seed International, a UPL Group company as a sorghum breeder for the African continent based in Ethiopia.