Genome-wide insights of Ethiopian indigenous sheep populations reveal the population structure related to tail morphology and phylogeography

Background Ethiopian sheep living in different climatic zones and having contrasting morphologies are a most promising subject of molecular-genetic research. Elucidating their genetic diversity and genetic structure is critical for designing appropriate breeding and conservation strategies. Objective The study was aimed to investigate genome-wide genetic diversity and population structure of eight Ethiopian sheep populations. Methods A total of 115 blood samples were collected from four Ethiopian sheep populations that include Washera, Farta and Wollo (short fat-tailed) and Horro (long fat-tailed). DNA was extracted using Quick-DNA™ Miniprep plus kit. All DNA samples were genotyped using Ovine 50 K SNP BeadChip. To infer genetic relationships of Ethiopian sheep at national, continental and global levels, genotype data on four Ethiopian sheep (Adilo, Arsi-Bale, Menz and Black Head Somali) and sheep from east, north, and south Africa, Middle East and Asia were included in the study as reference. Results Mean genetic diversity of Ethiopian sheep populations ranged from 0.352 ± 0.14 for Horro to 0.379 ± 0.14 for Arsi-Bale sheep. Population structure and principal component analyses of the eight Ethiopian indigenous sheep revealed four distinct genetic cluster groups according to their tail phenotype and geographical distribution. The short fat-tailed sheep did not represent one genetic cluster group. Ethiopian fat-rump sheep share a common genetic background with the Kenyan fat-tailed sheep. Conclusion The results of the present study revealed the principal component and population structure follows a clear pattern of tail morphology and phylogeography. There is clear signature of admixture among the study Ethiopian sheep populations Electronic supplementary material The online version of this article (10.1007/s13258-020-00984-y) contains supplementary material, which is available to authorized users.


Introduction
Given its proximity to the Arabian Peninsula, Ethiopia is considered as a corridor for the introduction of livestock species including sheep to the African continent (Hanotte et al. 2002;Muigai and Hanotte 2013). Sheep and their products play a critical role in the livelihood of millions of farmers and pastoral communities in Ethiopia and are important for the national economy (Assefa et al. 2015). Ethiopia possesses highly diversified indigenous sheep populations adapted to highly diverse agro-ecologies and the populations are maintained by difference ethnic communities (Haile et al. 2002;Gizaw et al. 2008). There are about 29.33 million heads of sheep which are phenotypically identified into 14 populations (Gizaw et al. 2008; 1 3 Leta and Mesele 2014). However, the local sheep populations with very low productivity dominate smallholder production systems, which are mainly confounded by lack of effective long-term sheep genetic improvement, multiplication and effective delivery systems, environmental as well as socio-economic factors (Tadele 2010; ). On the other hand, the demand for sheep and their products has increased from time to time due to a growing human population and urbanization. Therefore, it is an urgent need to improve their productivity in order to raise smallholder farmers' incomes, to meet the demand of the growing human population as well as the national economy (FAO 2012).
The local sheep populations in Ethiopia are mainly named after the geographic location or ethnic group, based on phenotypic characteristics or agro-ecology (Solomon et al. 2011). This might have led to group genetically similar populations into phenotypically distinct groups. The previous studies indicated that Ethiopian indigenous sheep were clustered together based on their geographic distribution and tail phenotypes (Gizaw et al. 2007;Edea et al. 2017;Ahbara et al. 2019). Characterizing genetic diversity and understanding of population structure is a key aspect of developing sustainable breed improvement strategies (Groeneveld et al. 2010) and understanding adaptation to extreme environments (Ai et al. 2014). Molecular characterization were done on genetic diversity and population structure of Ethiopian sheep populations using microsatellite markers (Gizaw et al. 2007;Hellen 2015). In a recent study, Edea et al (2017) and Ahbara et al (2019) evaluated population structure of Ethiopian indigenous sheep populations (Kefis, Adane, Arabo, Gafera, Molale, Bonga, Gesses, Kido, Doyogena, ShubiGemo, Loya, Adilo, Menz, Arsi-Bale, Horro and Black Head Somali) and revealed high level of admixture among the populations using high and medium density SNP chip panel, respectively. Recently developed genome-wide ovine SNP array has provided a tool for investigating genetic diversity at high resolution, inferring population history, and mapping genomic regions subject to selection and adaptation (Kijas et al. 2009;Yang et al. 2016;Zhao et al. 2017).
However, there are still indigenous sheep populations in Ethiopia yet to be evaluated at genome-wide. These sheep populations include Farta, Wollo, Washera and Horro sheep which are adapted to diverse agro-ecologies and with different tail phenotype. Therefore, in this study we aimed to investigate genome-wide genetic diversity and population structure of the indigenous sheep populations in Ethiopian.

Sheep populations, sampling and SNP genotyping
Blood samples were collected from four Ethiopian sheep populations (n = 115) adapted to diverse agro-ecological environments. Detail description of the populations is summarized in Table 1. Farta (n = 26) and Wollo (n = 28) sheep are short fat-tailed coarse-wool sheep and adapted to subalpine environments (2000-3200 m a.s.l); Washera (n = 31) sheep is short fat-tailed hairy sheep and mainly inhabits wet, warmer mid-highlands (1600-2497 m a.s.l); Horro (n = 30) sheep is long fat-tailed hairy sheep and predominant in midto high-altitude environments (1400-2000 m a.s.l) (Gizaw et al. 2008). Geographic positioning system (GPS) coordinates were recorded for each sheep population and geographical distribution map was developed based on their GPS coordinates ( Supplementary Fig. 1).
The samples were collected from different households in different villages using 5 ml of vacutainer tube with 1 ml EDTA as anti-coagulant. A maximum number of 1-3 distinct breeding adult animals were sampled per household based on flock owners' information and typical phenotypic characteristics that allow to avoid sampling of related animals. Genomic DNA was extracted using Quick-DNA™ Miniprep plus kit following the procedures of Biological Fluid and Tissue protocol (https ://WWW.zymor esear ch.com/m/ D4068 ). All 115 genomic DNA samples were genotyped using Ovine 50 K SNP BeadChip (Illumina, San Diego, CA, USA) by GeneSeek/Neogen (Lincoln, NE, USA).
To infer genetic relationships of Ethiopian sheep populations at national level, Ovine 600 K SNP BeadChip genotype data of 44 animals representing four extra Ethiopian sheep populations (Adilo = 11; Arsi-Bale = 8; Blackhead Somali = 13; Menz = 12) were included in the study. Detail description of the extra four Ethiopian sheep populations is summarized in Table 1. The genotype data (600 K) were obtained from NRSP-8 Community File Sharing Platform (https ://www.anima lgeno me.org/repos itory /pub/KORE2 017.1122/).Ovine 50 K SNP BeadChip genotype data of 105 animals representing six breeds from east (Kenya: Red Maasai), north (Egyptian Barki; Moroccan sheep) and south (South Africa: Namaqua Afrikaner) Africa, Middle East (Iran: Afshari) and south-west Asia (India: Garole) were included in the study to investigate the genetic relationships in greater detail and infer population structure of Ethiopian sheep populations at continental and global levels. The genotype data (50 K) were obtained from Sheep HapMap and Animal Resources dataset (https ://www. sheep hapma p.org/hapma p.php). Therefore, a total of 14 sheep populations, including extra four Ethiopian sheep, are included in the study as reference.

Quality control and genetic diversity analyses
Genotypic data for 600 K and 50 K platforms were merged and common autosomal SNP markers (42,227) were obtained. These common autosomal SNPs with call rate less than 90% and minor allele frequency (MAF) less than 0.05 were filtered out using PLINK v1.9 (Purcell et al. 2007) and leaving 37,771 autosomal SNPs. Using the same software, this generated dataset were further subjected to linkage disequilibrium (LD) pruning to avoid the possible influence of clusters of SNPs on population relationships and structure analyses (Yuan et al. 2017). Following the LD pruning, 30,292 autosomal SNPs were retained for population structure analyses.
To evaluate the levels of within-population genetic diversity, the expected (H E ) and observed (Ho) heterozygosity and inbreeding coefficient (F) were estimated for each population using Arlequin v.3.5.2 (Excoffier and Lischer 2010). To partition genetic variation among groups, among population within groups and individuals within population, analysis of molecular variance (AMOVA) was performed following 10,000 permutations in Arlequin v.3.5.2 (Weir and Cockrham 1984). The analysis was done for the 14 global sheep populations grouped into major clusters based their geographical origin and community the populations are kept. This include East African (Ethiopia and Kenya), Arab region (Egypt, Morocco and Iran), India (Indian Garole sheep) and South African (Namaqua Afrikaner sheep). We further regrouped Ethiopian sheep populations separately according to their tail phenotype (Short fat-tailed: Wollo, Farta, Washera and Menz; Long fat-tailed: Arsi-Bale, Adilo, and Horro; Fat rumped: Blackhead Somali) and geographical location (Northern: Washera, Farta, Wollo and Menz; Southern: Adilo, Arsi-Bale and Horro; Eastern: Blackhead Somali).

Genetic population structure analyses
Principal components analysis (PCA) was performed using PLINK v1.9 (Purcell et al. 2007) to investigate the genetic structure and relationships among the studied populations based on genetic correlations between individuals. A graphical display of the first two principal components (PC1 and PC2) was generated using gglot2 package provided by R (Wickham 2009). Population structure analyses carried out in STRU CTU RE v.2.3.4 (Pritchard et al. 2000) was used to investigate underlying genetic structure and estimate the proportion of shared genome ancestry between the studied populations. The STRU CTU RE output was further analyzed in STRU CTU RE HARVESTER (Earl 2012) to determine the optimal number of ancestral genomes (K) and proportions of genome ancestry shared among the studied populations using ΔK method (Evanno et al. 2005). To further evaluate the within and between Ethiopian and global sheep

Genetic diversity
The mean values of H E , Ho and F, as measures of withinpopulation genetic diversity, for 14 sheep populations are shown in Fig. 1 (Table 2). When an analysis was performed for Ethiopian sheep populations grouped based on their tail phenotype (Table 1), 3.14% (P < 0.001) of the variance  was among groups (Table 2). We further ran AMOVA by grouping the Ethiopian sheep populations according to their geographical distribution (Table 1). Results indicated that 4.67% (P < 0.001) of the variation was among groups, 3.40% among populations within groups and 91.93% within populations (Table 2).

Principal component analyses
The To illustrate relationships within individuals and among Ethiopian sheep populations, PCA was performed (Fig. 2b). Principal components 1 and 2 accounted for 16.89% and 9.17% of the total variation, respectively and clustered the eight sheep populations according to their tail phenotypes and geographical distribution: long fat-tailed (Arsi-Bale, Horro, Adilo), short fat-tailed (Washera, Farta, Wollo, Menz), and fat-rumped (Black Head Somali).

Phylogenetic network analysis
A Neighbor-Net network constructed from pairwise comparison cluster the global dataset (14 sheep populations) according to their geographic region and tail phenotype as indicated in Fig. 3. The line in the phylogenetic network

Genetic population structure analyses
Genetic population structure analysis on the global dataset (14 sheep populations) was carried out using hypothetical ancestral clusters (K) ranging from 2 to 15 and cluster the studied populations according to their geographic region and tail phenotype (Fig. 4). The highest ∆K value registered at K = 8 suggesting this to be the most optimal number of clusters explaining the variation in the dataset (Fig. 5a). The proportion of each ancestral cluster in each population at K = 8 is shown in Fig. 4 and Supplementary Table S2. Indian Garole and Namaqua Afrikaner sheep separated from the rest of sheep populations at K = 2 and K = 3, respectively. East Africa (Ethiopia and Kenya) sheep populations separated from the rest of the sheep populations at K = 4.
At optimal K (K = 8), Ethiopian (Blackhead Somali) and Kenyan (Red Maasai) sheep share up to 94% a single common genetic background (Cluster 3). The Asian thintailed sheep, Indian Garole, formed an independent cluster (Cluster 2) with high proportion (~ 95%). The South Africa fat-tailed sheep, Namaqua Afrikaner, formed a clear separate cluster (Cluster 5) with the highest proportion (~ 100%) from the rest of the populations. The two North African representatives shared one genetic background (Cluster 8), and both shared 22-26% common genetic background with Afshari of Iranian sheep (Cluster 1). The Asian thintailed sheep showed distinct clustering from east Africa and Three to nine hypothetical ancestral clusters (K) were tested with structure on the Ethiopian dataset (national dataset). The highest ∆K value suggested K = 4 as most optimal number of ancestral clusters present in the national dataset (Fig. 5b). The proportion of each ancestral cluster in each population at K = 4 is shown in Fig. 6 and Supplementary Table S3. Menz (highest proportion ~ 100%), Wollo, Farta and few individuals of Washera sheep share one genetic cluster group (Cluster A). Cluster B observed in Washera (proportion ~ 55%) and few individuals of Wollo, and Farta sheep. Horro, Arsi-Bale (ARB) and Adilo sheep form Cluster C whereas Cluster D observed in Blackhead Somali (BHS) and some individuals of Arsi-Bale (ARB) and Adilo sheep. Ethiopian short fat-tailed sheep (Washera, Wollo, Farta, and Menz) did not represent one genetic cluster group rather they form two different genetic cluster groups (Cluster A and B) which is supported by admixture analysis for other Ethiopian short fat-tailed sheep populations (Ahbara et al. 2019). Farta, Wollo and Menz sheep formed one genetic cluster group (Cluster A) while Washera and few individuals of Farta and Wollo sheep formed another genetic cluster group (Cluster B). Close clustering was observed among Ethiopian long fat-tailed sheep (Horro, Adilo, and Arsi-Bale) (Cluster C). The findings are further supported by national principal component and global sheep genetic population structure analyses results (Figs. 2b, 4, respectively).

Discussion
The genotype data generated by Ovine 50 K SNP BeadChip were used to investigate genetic diversity and population structure of Ethiopian indigenous sheep populations. To assess the genetic diversity and relationships in greater detail and infer population structure of Ethiopian sheep populations at the continental and global levels, populations from other regions of the African continent and the world were included. The findings revealed that the estimated within Ethiopian sheep population diversity is in line with previous studies on indigenous sheep populations in Ethiopia using microsatellite markers and SNP chip (Gizaw et al. 2008;Edea et al. 2017;Ahbara et al. 2019). The presence of high genetic diversity within-population is congruent with the high variability observed in phenotypic characters, particularly in coat color within the sub-alpine sheep populations (Gizaw et al. 2008). High degree of genetic diversity withinpopulation is a characteristic of large traditional populations that have not been subjected to strong selection indicating the need to conserve such traditional populations (Lauvergne et al. 2000), and thus can be exploited through implementing appropriate breeding strategies.
Ethiopian indigenous sheep showed slightly lower levels of genetic diversity (H E = 0.366) than the presumed ancestral populations in the Middle East (Afshari; H E = 0.376) and North Africa (H E = 0.401). Populations found near to domestication center are expected to retain higher allelic diversity than those that migrated farther away (Peter et al. 2007). The higher level of diversity estimates in North Africa as compared to east Africa populations can be further explained by the fact that North Africa populations reflect a high degree of admixture between fat-tailed and thin-tailed sheep as demonstrated in the STRU CTU RE analyses result ( Fig. 4 and Supplementary Table S2). Given its proximity to the Near East and Mediterranean Sea, North Africa served as a gateway for early livestock introduction to the African continent and is considered as a secondary hotspot of genetic variation (Gautier 2002).
Although they are defined by the same tail phenotype, the four Ethiopian short fat-tailed sheep (Wollo, Menz, Farta and Washera) did not cluster together. Adaptation to different eco-climates that can impede gene flow, may have shaped this genetic sub-structuring (Madrigal et al. 2001;Gizaw et al. 2007). The close clustering of the three short fat-tailed sheep (Wollo, Farta and Menz) is consistent with adaptation to similar eco-climates (sub-alpine), similar in tail phenotype and they are maintained by the same ethnic group or community which may act as barriers to gene flow that shape population genetic structure (Madrigal et al. 2001). The distinct clustering of the warmer mid-highland short fat-tailed sheep (Washera) from the three sub-alpine short fat-tailed sheep (Wollo, Farta and Menz) could be Fig. 6 Population structure analyses of Ethiopian sheep associated with adaptation to different eco-climates and face different selection pressure, which may have shaped their genomes in different manner. Likewise, The close association observed among the three long fat-tailed sheep populations (Horro, Arsi-Bale, and Adilo) could be due to they are predominantly maintained by the same ethnic group or community, have same tail phenotype, and inhabit similar eco-climates (highland) and face common selective pressures, which may have shaped their genomes in a similar manner. The chances of animal exchange are greater within the same ethnic group or community than between any two different ethnic groups or communities (Gizaw et al. 2007).
The distant association observed between the short fattailed sheep, Menz, and the fat-rumped sheep, Black Head Somali, could be due to selection for ecological adaptation, geographical isolation and differences in tail phenotype (Gizaw et al. 2008;Edea et al. 2017). This finding is not in line with the report of Ahbara et al. (2019). The author indicated that the short fat-tailed, Molale-Menz, clustered together with the Ethiopian fat-rumped sheep which could be close proximity of sampling sites for the two different sheep populations since they are distinct in tail phenotype, adapted to different eco-climates, and maintained by different ethnic group or community which may have shaped their genomes differently (Gizaw et al. 2008).
The close association between Ethiopian fat-rumped sheep, BHS, and the Kenyan fat-tailed sheep, Red Maasai, could be because of the two sheep populations are reared under mobile pastoral and agro-pastoral systems and there is a high chance of exchanging animal across the border (Wilson 2011). They are also adapted to similar ecological environments (lowland) and face common selective pressures (Wilson 2011). The close clustering of Ethiopian fat-rump sheep (BHS) and Kenyan fat-tailed sheep (Red Maasai) suggests the dependent introduction and dispersion histories of Africa fat-tailed and fat-rumped sheep into the continent. This finding is agree with previous report that indigenous African sheep genetic resources have been classified into two main groups with a largely non-overlapping distribution: thin-tailed and fat-tailed (including fat-rump) sheep (Wilson 1991) but not in line with the report of Ryder (1984) which indicated that the fat-tailed sheep were introduced into Africa during the third wave of migration following thin-tailed hair sheep and thin-tailed wool sheep, fat-rumped sheep much later.
The close association of the two phenotypically different (variation in tail type) North African sheep populations could be due to gene flow between the two populations since they have close geographical proximity. Previous studies revealed that north Africa is mostly populated by fat-tailed sheep (Muigai and Hanotte 2013), but our STRU CTU RE analysis indicated there is high signatures of admixture in the genomes of north Africa populations as compared to their east and South African populations and they shared 22-26% its genome with Middle East thin-tailed sheep (Iranian Afshari). The influence of Middle Eastern thin-tailed sheep detected in north Africa sheep can be explained by the historical introduction of sheep into Africa and their dispersion across the continent through the Nile Valley; for instance, thin-tailed sheep spread into the Western Sahara via Northern Africa (Muigai and Hanotte 2013), which may have left its genomic footprint in the current north Africa sheep populations.
The close clustering of east Africa sheep populations and distinct separation from their north counterparts was well demonstrated by our phylogenetic and PCA analyses (Figs. 3, 2a, respectively). This result coincides with the evidence that fat-tailed sheep were introduced into Africa via two independent routes: one via the north-east Africa and the Mediterranean Sea coastline, and the other via the Horn of Africa crossing through the strait of Babel-Mandeb (Ryder 1984).
The distinct clustering of Asian thin-tailed sheep from fat-tailed east and South Africa as well as from fat-rumped east Africa (Ethiopia) sheep suggests its independent introduction into the continent. It is in line with the distinct histories and non-overlapping geographic distributions of the African thin-tailed with fat-tailed and fat-rumped sheep (Hanotte et al. 2002;Muigai 2003b), and support the predominance of fat-tailed sheep in the east and South parts of Africa (Muigai and Hanotte 2013). Moreover, analyses of autosomal markers and the Y chromosome have revealed the distinct evolutionary histories of thin-and fat-tailed African sheep populations (Aswani 2007).

Conclusions
Our genome-wide SNP analyses revealed that there is clear signature of admixture among Ethiopian sheep populations which could be accounted by some level of current admixture which results in low variation among the sheep populations but large within population variation. The principal component (PCA) and population structure analyses of Ethiopian sheep revealed four distinct genetic cluster groups according to their tail phenotype and geographical distribution. The short fat-tailed sheep populations did not represent one genetic cluster group which require further investigation. Our population structure analyses of Ethiopian sheep population demonstrated a clear pattern of the tail morphology and their phylogeography. Further investigation is required on morphometric basis of tail morphology variation in indigenous Ethiopian sheep populations to confirm the genetic basis of tail morphology variation investigated (Ahbara et al. 2019). Principal component and population structure analyses of global sheep population suggests the independent introduction of thin-tailed sheep into the continent but the dependent introduction and dispersion histories of fat-tailed and fat-rumped sheep. Principal component and phylogenetic analysis of global sheep population results coincide with the evidence that fat-tailed sheep were introduced into Africa via two independent routes: Horn of Africa, via the strait of Bab-el-Mandeb and Northern Africa, via the Isthmus of Suez from the Middle East (Ryder 1984).