The recent high-resolution mtDNA studies are offering the possibility of shedding light on ancient and recent human migration events, allowing to inferring more precisely about the geographical origin of lineages observed nowadays in a certain region. In fact, the characterization of the full mtDNA sequence is being used to investigate local events as the Chadic expansion from East Africa towards Chad Basin in the last 8,000 years [1] or historic movements as the diaspora of Jews [2, 3], which could not be approached in previous more limited mtDNA surveys.

This approach is being applied to the long-enduring discussion about pre-historic migrations across the Mediterranean Sea, leading to exchange of lineages between Iberia and Maghreb [4, 5]. Recently, the sub-characterization of H-lineages observed in several North African populations revealed its affiliation within Iberian expanded lineages, after the Last-Glacial Maximum [6, 7], being the same observed in Tuareg living in the Sahel [8]. The Near Eastern contribution to the pool of H lineages in North Africa was minimal, indicating that a pre-historic European lineage input occurred in elevated frequencies enriching the ancient Near Eastern background of North African populations mainly constituted by the low frequent haplogroups U6 and M1 [9].

Another major contribution to the pool of North African populations was the sub-Saharan one. It is known that a proportion of 1/4 to 1/2 of North African female pool is made of typical sub-Saharan lineages (designated as haplogroups L0-L6), in higher frequencies as geographic proximity to sub-Saharan Africa increases [4, 5]. Nevertheless, the Sahara is a strong geographical barrier against gene flow, at least since 5,000 years ago, when desertification affected a larger region, ending up the humid and greening conditions established by around 10,000 years ago, in the so called Holocene Climatic Optimum [10].

But, if geographical and climatic conditions have not been favorable to sub-Saharan gene flow to North Africa in the last 5,000 years, the Arab trans-Saharan slave trade could have facilitate enormously this migration of lineages. Till now, the genetic consequences of these forced trans-Saharan movements of people have not been ascertained, being over-shadowed by the Atlantic slave trade towards the New World. In fact, the huge number of sub-Saharan people introduced in the New World from the 16th century onwards allowed to investigating in great detail the genetic consequences of this historical event [1113], and the complete sequencing of L-lineages is indicating very precisely about the origin of lineages observed nowadays in America [14]. Nonetheless, some authors affirm [15] that the Arab slave trade of black slaves was much the same in total to the Atlantic slave trade, and interestingly far longer in the time scale. It began in the middle of the seventh century (650 A.D.) and survives still today in Mauritania and Sudan, summing up 14 centuries rather than four as for the Atlantic slave trade. Although estimates are very rough, figures are of 4,820,000 for the Saharan trade between 650 and 1600 A.D., and, for comparison purposes, of 2,400,000 for the Red Sea and the Indian Ocean trade between 800 and 1600 A.D. [16]. Notwithstanding the thousands of kilometers along the edge of the Sahara, the Red sea and the East African coast, from where slave exports came, there were relatively few export points, concentrating geographically the impact of the trade. Black slaves were brought by Berber and Arab merchants mainly to actual Morocco, Algeria, Libya and Egypt through six main routes that crossed the desert (Figure 1): one went north to Morocco from ancient Ghana (at present southeastern Mauritania and Western Mali); a second brought slaves to Tuwat (southern Algeria) from ancient Timbutku (Mali); a third passed from the Niger valley and the Hausa towns through the Air Massif to Ghat and Ghadames; a central route linked Lake Chad region to actual Libya (Murzuk), being one of the most important in slave commerce as it offered oases at regular intervals that could satisfy the caravan's needs; in East Africa, the slave caravan followed mainly the Nile River from actual Sudan (Dar Fur) to Egypt (Assiout); and a sixth passed north from the confluence of the Blue and the White Nile to Egypt. Some of these routes were interconnected: the routes north from Timbuktu went to Morocco, Algeria, and Libya; while the Dar Fur-Egypt route connected with the route north from the upper Nile valley.

Figure 1
figure 1

Routes for trans-Saharan slave trade. Adapted from Segal (2002) and Lovejoy (1983).

Males were sought for a variety of functions: doorkeepers, secretaries, militaries or eunuchs. Black soldiers were seen from Islamic Spain to Egypt, and in Morocco a whole generation of black young boys were bought at the age of 10 or 11 and trained to become its army. However, the bulk of the trade was in females, as domestic servants, entertainers and/or concubines: two females for every male overall, in contrast to the ratio of two males for every female overall in the Atlantic trade [15]. Some harems could be enormous, reaching even the extravagating number of 14,000 concubines. Young female slaves were instructed in household crafts and were then provided with resources to buy a home and get married.

The Eastern sub-Saharan slave trade towards Arabia was investigated through mtDNA hypervariable region I (HVRI) diversity [17], leading to concluding that higher frequencies of L lineages are observed in Arab comparatively with non-Arab populations in the Near East, having been introduced in the last 2,500 years. These conclusions were supported afterwards by other studies [18, 19]. This Eastern sub-Saharan slave trade involved mainly maritime routes across the Red Sea, which was dominated by the Southern Arabs, already around the 12th century BC.

The Western trans-Saharan slave trade deserves a more careful genetic investigation. In this work we will present the results of mtDNA haplogroup affiliation of El Jadida population, approximately 100 km south of Casablanca, in the Moroccan Atlantic coast. We performed high-resolution screening of selected haplogroups in this Moroccan sample: haplogroup H, in order to get more evidence on North Mediterranean influence; and haplogroup L3, one of the most geographically diversified sub-Saharan haplogroup. For the L3 haplogroup, we conducted the complete mtDNA sequencing of 8 L3 haplotypes from El Jadida, and compared the complete North African L3 sequences which have been described [14, 2022] with the many other known sub-Saharan sequences (summed up in [23]). We also performed analyses of geographical interpolation for sub-Saharan haplogroup frequencies across Africa, by using an extended database summing up 4908 individuals.


Samples and DNA extraction

Blood samples were collected from 81 unrelated people from El Jadida, Morocco, nearly 100 km south of Casablanca. Appropriate informed consent was obtained from all individuals and total DNA was extracted from blood using a standard Chelex 100 protocol.

mtDNA amplification and sequencing

The mtDNA hypervariable regions I and II (HVRI and HVRII) were amplified as described elsewhere [24], in both forward and reverse directions. The amplified samples were purified with Microspin S-300 HR columns (GE Healthcare, Uppsala, Sweden) and automated sequencing was carried out in an ABI Prism 3100 (AB Applied Biosystems, Foster City, CA, USA) using the kit Big-Dye Terminator Cycle Sequencing Ready Reaction (AB Applied Biosystems, Foster City, CA, USA). Temperatures profile for sequencing reactions consisted in denaturation at 96°C for 4 min and 35 cycles of 96°C for 15 s, 50°C for 9 s and 60°C for 2 min, followed by 60°C for 10 min. Sequence editing was performed both by using the BioEdit version [25] and by manually checking the electropherograms, tasks performed by two independent investigators.

Haplogroup H variation was dissected in a total of 14 samples according to [26], which basically consisted in sequencing four mtDNA coding-region segments encompassing the principal diagnostic positions in haplogroup H samples: 3001-3360, 3661-4050, 4281-4820, and 6761-7050 (a total of 1580 base pairs). Furthermore, haplogroup L3 variation was investigated in 8 samples by performing complete sequence of the molecule (~16,569 bp) as described in [27], in a total of 32 overlapping segments of around 600 bp each. The 8 complete mtDNA sequences are deposited in GenBank database with accession numbers: GU455415-GU455422.

Haplogroup affiliation

Mutations were scored relatively to the revised Cambridge Reference Sequence (rCRS; [28]), and its positions numbered from 1 to 16569. For haplogroup affiliation, the most recent phylogenetic data, including information from complete sequencing, were followed: for H [29]; for K [2]; for J, R, T, and V [30]; for U [30, 31]; for I and M1 [32]; for X [33]; and for L [14].

Statistical analyses

Analysis of population structure, molecular diversity measures, and tests of selective neutrality were executed in the software Arlequin version 3.0 [34].

Phylogenetic reconstruction of mtDNA sequences was based on HVRI and complete sequence. A preliminary network analysis [35] led to a suggested branching order for the tree and the L3 tree published in [14] was used as reference tree. The dates of the most recent common ancestor of specific subclusters in the phylogeny were estimated using ρ, the average number of transitions from the ancestral sequence type to all sequences in the cluster, based in the recently updated mutation rate published by [36] for the entire molecule (1 mutation in every 3624 years), and by using the calculator provided in the paper. The highly variable position 16519 was not considered for the time estimates. Each tip node of the phylogenetic tree was counted as one event if shared by a few samples.

To determine and visualize the geographical distribution of haplogroups L interpolation maps were drawn by using the "Spatial Analyst Extension" of ArcView version 3.2 The "Inverse Distance Weighted" (IDW) option with a power of two was used for the interpolation of the surface. IDW assumes that each input point has a local influence that decreases with distance. The geographic location used is the centre of the distribution area, from where the individual samples of each population were collected. Data for other populations were taken from several publications and are summed up in Additional File 1 and displayed in Figure 2A.

Figure 2
figure 2

Map showing location of the population samples (A) used in this work and interpolation map for the L lineages in those samples (B).

Correlograms for Morans I indices versus distances were obtained for the total L in the populations and for L0, L1, L2 and L3 proportions of the sub-Saharan pool in the samples by using the PaSSAGE software v 1.0 [37]. The existence of a cline is assumed when a continuous decline trend composed of statistical significant points is observed.

Results and Discussion

mtDNA diversity and haplogroup affiliation in El Jadida sample

The characterization of HVRI and HVRII diversities in the 81 individuals from El Jadida led to the identification of the haplotypes reported in Table 1. The HVRI mtDNA diversity observed (Table 2) was, in general, as high as observed in other North African populations [46] and the Fu's Fs values for the neutrality tests were significantly negative, in accordance with populations in expansion, except, notoriously for the Libyan Tuaregs reported by [38].

Table 1 Haplotypes (for HVRI, HVRII and the four coding segments typed in possible H samples) and haplogroup classification in El Jadida.
Table 2 Diversity measures in El Jadida and neighbour populations, within HVRI.

The analysis of molecular variance (AMOVA) was performed in order to evaluate genetic structure within North Africa, revealing a residual 3% variation between populations. Relatively to pairwise FST genetic distances (not shown), the only significant values after Bonferroni's correction were between El Jadida-Algeria (0.061; p value = 0.000 ± 0.000), El Jadida-Tuaregs from Libya (0.022; p = 0.000 ± 0.000), El Jadida-Morocco-Berbers (0.027; p = 0.000 ± 0.000) and El Jadida-El Alia from Tunisia (0.024; p = 0.000 ± 0.000).

When analyzing the proportions of sub-Saharan and West Eurasian mtDNA haplogroups (Table 1) in El Jadida population, the characteristic mixed pool was observed, with frequencies of 30.86% and 69.14%, respectively. The sub-Saharan pool presented the branches L1, L2 and L3, in the following frequencies: 24%, 28% and 48% of the sub-Saharan pool. The basal haplogroup L0 was absent. In the West Eurasian pool, the haplogroups said to have been introduced into North and East Africa as result of a Back-to-Africa migration from the Near East, U6 and M1, were observed with frequencies of 2.47% and 6.17% in El Jadida.

Clearly, the main component of the West Eurasian lineages was made of possible Iberian expanded lineages following the post-glacial climate improvement: H1 (12.35%), V (9.88%) and U5b (1.23%). There were low frequent lineages belonging to the HV branch of the maternal tree which could have come to El Jadida from the Near East, (H* - 3.70%; H7 - 1.23%; HV1 - 1.23%) as well as R0a (3.70%), X (1.23%), N1b (1.23%), J (7.41%), T (2.47%). There was also a considerable amount of U/K lineages, besides the already referred U6 and U5a: K (9.88%), U* (3.70%) and U4 (1.23%). Curiously, five out of eight K individuals in El Jadida presented a substitution on position 16287 (besides the haplogroup defining 16224-16311 polymorphisms); this haplotype was so far observed in 1 Italian (belonging to sub-haplogroup K1a4) and two Moroccan individuals (sub-haplogroup K1a2) out of 789 K sequences in [2] and absent in other North African populations [6].

Sub-Saharan haplogroups across North Africa

Based on a database summing up 4908 African and 2178 Near Eastern/Arabian Peninsula individuals (Figure 2A shows sample locations, further indicated in Additional File 1) we assayed interpolation analyses of L haplogroup frequencies. As can be seen in Figure 2B, the north to south increase of frequency across North Africa and the Sahara is visible. In the East of the African continent, the highest L frequencies are attained in more southern latitudes than in the rest of the continent, due to presence of M and some N (R0a and U6) lineages, especially high in Ethiopia.

We then focused attention in the region across Sahara, for each of the main L haplogroups. When interpolation analyses are performed for the frequencies in total population, any sign of gradient across the Sahara is lost, as differences between L frequencies southern and northern of the desert are high. For this reason, interpolation analyses were performed for the frequencies of each haplogroup in the L pool, enhancing the possibility of detecting gradients across the Sahara.

L0 (Figure 3) attains the higher proportion inside L pool in East Africa, including the Near East and Arabian Peninsula, following a decreasing frequency from south towards north. This pattern is coincident with the one for haplogroup L0a, while L0d and L0f are almost restricted to the south.

Figure 3
figure 3

Interpolation maps for L0 haplogroup in the sub-Saharan pool observed in each sample.

L1 total (Figure 4) attains the highest proportions in the L pool in central Africa, in Pygmy populations, followed by some of the north-west populations. This presence of L1 in north-west African samples is mainly due to L1b sub-haplogroup, while L1c is quite restricted to Central Africa. The presence of this haplogroup in Near East and Arabian Peninsula is quite limited.

Figure 4
figure 4

Interpolation maps for L1 haplogroup in the sub-Saharan pool observed in each sample.

L2 total (Figure 5) is one of the two dominant haplogroups in the L pool, in many regions across Africa, namely in central-west and south-east regions, most probably due to Bantu expansion [11, 12] and towards north-west, potentially due to the trans-Saharan slave trade. The very central African populations, mostly Pygmy groups, present low proportions of L2 lineages in its pool. This pattern is caused mainly by sub-haplogroup L2a, the most frequent lineage in L2, while L2b, L2c and L2d attain highest proportions in the west coast between Senegal and Mauritania.

Figure 5
figure 5

Interpolation maps for L2 haplogroup in the sub-Saharan pool observed in each sample.

L3 total (Figure 6) reaches the highest proportions in North and then east Africa. The sub-haplogroups L3b and L3d clearly dominate in the west, as known before, as well as in North Africa. L3e has a more central dispersion across Sahara, being also frequent in South Africa. L3f has an eastern localization across the Sahara, with some foci in Central Africa southern of Sahara, due to high frequencies of L3f3 in Chadic-speaking groups [1]. L3h, L3i, L3w and L3x (Figure 7) are rare and clearly limited to East Africa.

Figure 6
figure 6

Interpolation maps for L3 total, L3b, L3d, L3e and L3f haplogroups in the sub-Saharan pool observed in each sample.

Figure 7
figure 7

Interpolation maps for L3h, L3i, L3x and L3w haplogroups in the sub-Saharan pool observed in each sample.

When the spatial autocorrelation analysis was applied to the total L frequency in the populations, and to the L0, L1, L2 and L3 proportions of the sub-Saharan pools in the samples, signs of cline were evident for all them (Figure 8). The positive values at small distances indicate that individuals from the same population are more similar to each other; while the negative values at the largest distances (not so clear for L1 and L2) suggest a marked genetic differentiation across the African continent and Arabian Peninsula.

Figure 8
figure 8

Spatial correlograms of Moran's I indeces for the total L frequency in the populations, and for the L0, L1, L2 and L3 proportions of the sub-Saharan pools in the samples. Geographic distances separating samples are distributed into 14 classes. Full dots represent significant p-values (p < 0.05); empty dots are non-significant p-values.

Complete L3 sequences

We performed the complete sequencing of 8 L3 different haplotypes observed in El Jadida. This haplogroup was selected because it is the most diversified sub-Saharan haplogroup in El Jadida and some of its lineages could have been inputted in North Africa from East Africa. The complete sequencing allowed the fine characterization of these samples as follows (Figure 9): one L3b1, two L3d1'2'3, one L3e2b, one L3f1a, two L3f1b and one L3h1b.

Figure 9
figure 9

Phylogeny of the complete L3 sequences from El Jadida. Integers represent transitions when the suffixes "A", "G", "C" or "T" are appended and transversions when the suffixes "a", "g", "c" or "t" are appended. Deletions are indicated by a "d" following the deleted nucleotide position. Underlined nucleotide positions appear more than once in the tree.

Joining these 8 complete L3 sequences to 236 previously published ones (the ones summed up in [1, 14, 2022]), a good resolution of L3(xM, N) tree is obtained (Additional File 2; information for samples used is listed in Additional File 3). There are 39 sequences from North Africa, representing 16% of the complete L3 dataset, being 10 from Morocco, one from Algeria, four from Libya, 11 from Tunisia, and 13 from Egypt. So this work raised the homogeneity of complete L3 sequences across North Africa.

Most of these North African sequences share a recent ancestry with sequences observed in other parts of Africa, in the Holocene period (Table 3). This seems to point to a recent introduction of these lineages in North Africa from the original locations in sub-Saharan and East Africa. Namely, one Moroccan and one Libyan sequences belong to sub-haplogroup L3b1b, together with two West African sequences from Burkina, with a coalescence age of 9,926 ± 2,555 years. Three Egyptian, four Tunisian, one Libyan and one Moroccan sequences share a most recent common ancestor of 13,537 ± 1,058 years old with seven West African, two South African, six Americans (most probably African-descents), two East Africans, two Central Africans, five Near Eastern and two South Asians, being affiliated in haplogroup L3b1a. A Moroccan sequence shares an ancestry with one sequence from Guinea-Bissau of around 13,370 ± 4,205 years old, inside haplogroup L3b2. One Tunisian L3d1c sequences share an ancestor with one American African-descent at 9,246 ± 3,444 years ago. One Tunisian shares an ancestor at around 6,549 ± 2,883 years ago with one Syrian inside L3d1'2'3 haplogroup. One Tunisian and one Egyptian together with four individuals from Burkina, one from Guinea Bissau and two Americans share an ancestor at 14,179 ± 2,352 years ago, belonging to the haplogroup L3e2a. In haplogroup L3e2b, two Egyptians and one Moroccan share a most recent common ancestor at 11,985 ± 1,529 years ago with one Ethiopian, one Zaire, three West Africans and five Americans (with an younger co-ancestry between the Egyptian and one American at around 1,287 ± 1,278 years ago inside L3e2b2). One Egyptian, one Libyan and one Tunisian L3e5 sequences share an ancestor of 11,516 ± 2,264 years with one Burkina, one Ethiopian, one Sudanese and one American (with a somewhat younger co-ancestry between the Tunisian and the Ethiopian at around 10,610 ± 3,704 years ago). A Moroccan L3f1a shares a common ancestor with one Chadic sample at 14,766 ± 4,448 years ago. L3f1b haplogroup, having a most recent common ancestor of 14,710 ± 1,227 years old, bears some sequences from North Africa (two Egyptians and two Moroccan), and many other from other African locations and Near Eastern, with one Egyptian sample having an younger co-ancestor, at 4,343 ± 2,388 years ago, with one Jordanian and one American.

Table 3 Age estimates and standard deviations (in years) for the Most Recent Common Ancestor for the related lineages in North and sub-Saharan Africa.

A few L3 sequences observed in North Africa have older co-ancestry with other sub-Saharan regions, but as this occurs in the rarer haplogroups (almost restricted to East Africa), most probably the scenario will change as these become better characterized. This is the case for one L3 × 2 sequence observed in Algeria, which shares an older most recent common ancestor with two Ethiopian, one Israeli and one Kuwait, at 33,165 ± 4,499 years ago, but one Ethiopian and the Israeli and Kuwait sequences share a younger ancestor at 19,012 ± 4,200. Also, one Egyptian L3f2b sequence shares an ancestor with a Chadic one at around 24,809 ± 5,935 years ago. For L3 h1a2 haplogroup, one Egyptian and one Lebanese sequences share a coalescence age of 26,281 ± 6,139 years old. And for L3 h1b, with an age of 36,827 ± 3,772 years, one of the North African sequences (one Tunisian and one Moroccan) has a most recent common ancestor of 14,766 ± 4,448 years old with a sequence from Guinea Bissau.

So far, the two only complete published samples belonging to haplogroup L3k have a North African origin, one from Libya and one from Tunisia. This haplogroup has a coalescent age of around 29,251 ± 6,524 years old. As it is impossible to identify this haplogroup based only in control region information (only through HVRII polymorphism at position 235), it is impossible to add additional information about this haplogroup.


The genetic information testifies that recent migrations were the main events leading to the mtDNA pool observed nowadays in Maghreb populations. The ancestral Near Eastern pool, remnant of the ancient Back-to-Africa migration through the Levant around 40,000 years ago [9] is very restricted. Values for these haplogroups are around 8.6% in El Jadida and 10% in Tunisia [6]. A bulk of the West Eurasian lineages present in Maghreb populations is constituted by the typical Iberian sub-haplogroups H and V (12.3% and 9.9%, respectively, in El Jadida). It is highly probable that these lineages did expand towards North Africa when they expanded to the rest of the European continent, from Iberia, around 14,000 years ago, as they are present in all North African populations, even in those not known as directly historically related with Iberia [6].

Recent mtDNA data have shown that considerable local population expansions occurred in Sahel nomadic populations around 4,000 years ago, following important movements of northern and eastern African people towards the recently formed Sahel region. These local expansions were revealed in one branch of the typical East African haplogroups L3f, the L3f3 almost restricted to the Chadic-speaking nomadic groups [1] and in one branch of the typical Iberian haplogroup V in southern Tuareg populations [8]. Thus, the emergence of the modern Sahara, beginning some 4,000 years ago, hardened existing geographical divisions and separated peoples, forcing the black Saharans into the oases or southwards into the more attractive lands of the Sahel.

This barrier in gene flow is evident when attending to the global L haplogroup frequencies in African populations. There is a clear horizontal gradient across the continent, attaining values of 95% and higher in the Sahel region in West and Central Africa, but not in the Eastern African coast where those values are only reached around the border between Tanzania and Mozambique. The lower values for L frequencies in the eastern African coast are due to the southern migration of the Eurasian haplogroup M1, which is typical of East Africa. North Africa reaches L frequencies of 20-40%, while the Arabian Peninsula and the Near East have around 20-30% (only higher in Yemen).

The coalescence ages for the L sequences observed nowadays in North Africa shows the young ancestry of these lineages, which were originated in sub-Saharan Africa in the Holocene. This proves that sub-Saharan people did not leave traces in the maternal gene pool for the time of settlement of North Africa, some 40,000 years ago. And for sure, the continuous publishing of complete L sequences across Africa will reveal still younger ancestors between L sequences observed in both sides of the Saharan desert, bringing its introduction into North Africa to more recent/historical times.

It is also relevant that the interpolation analyses of haplogroups inside the L pool across the Sahara revealed horizontal gradients, matching in a high extent the known trans-Saharan routes. The West is dominated by L1b, L2b, L2c, L2d, L3b and L3d. The Center has L3e and some L3f and L3w. The East bears L0a, L3h, L3i, L3x and, in common with the Center, L3f and L3w. L2a is almost everywhere, strengthening its dominance in the slave package, not only towards the New World, but also in the trans-Saharan trade.

Both these genetic evidences agree with historical data that the introduction of the Asiatic horse into North Africa around 2,000 years ago lengthened the reach of desert nomads' raiding and trading. Before this period, the few black slaves taken from time to time across the Sahara would have been seen on the far side of the Mediterranean as mere exotic household ornaments. But, it may be argued that there was no regular trans-Saharan trade system before the rise of the camel-mounted Berber nomad, in the first Christian centuries, and perhaps not even until after the arrival of the first camel-riding Muslim Arabs in North Africa, in the seventh century [39].