Background

The North African mitochondrial DNA (mtDNA) genetic pool has been shown to reflect influence from different regions, with sizeable portions of lineages from Sub-Saharan Africa, the Middle East, and others that diversified perhaps first in Europe [110], a pattern also shown with autosomal data [11]. The geographic patterns of some of the haplogroups that constitute the North African mtDNA pool have been singled out as being more informative about early population histories than others; for example, the variation in haplogroup U6 [1, 12], a haplogroup that has been termed “the main indigenous North African cluster” [13], and, to a lesser extent the variation in M1, which is more predominantly found in Eastern Africa/Ethiopia [1416]. U6 and M1 both share the feature of being African-specific sub-clades of haplogroups otherwise spread only in non-African populations. Indeed, whilst most U clades are found in North Africa and in Eurasia, as far as the Ganges Basin, U6 is virtually restricted to North (West) Africa. For macro-haplogroup M, this African connection is even more puzzling, as haplogroups belonging to M are mostly found only in South, Central and East Asia, the Americas and Oceania, where no M1 has yet been reported.

The Palaeolithic archaeological record of North Africa is spatially and temporally diverse, revealing a variety of technological shifts during the later Pleistocene period. The Aterian, a regional variant of the Middle Palaeolithic (or Middle Stone Age), was previously thought to have existed ~40,000–20,000 years ago (KYA), and argued to mark the earliest modern humans in North Africa. These dates have been drastically reassessed and the upper bound is now closer to ~115 KYA [17] or even as old as ~145 KYA [18]. The transition from the Middle Palaeolithic to Upper Palaeolithic in North Africa is characterised by the appearance of the “Dabban”, an industry that is restricted to Cyrenaica in northeast Libya and represented at the caves of Hagfet ed Dabba and Haua Fteah [19]. Whilst a techno-typological shift occurred within the Dabban ~33 KYA [19], starker changes in the archaeological record occurred throughout North Africa and Southwest Asia ~23-20 KYA, represented by the widespread appearance of backed bladelet technologies. The appearance of these backed bladelet industries more or less coincides with the timing of the Last Glacial Maximum (LGM) (~23-18 KYA), including: ~21 KYA in Upper Egypt [20]; ~20 KYA at Haua Fteah with the Oranian [21]; the Iberomaurusian expansion in the Jebel Gharbi ~20 KYA [22]; and the first Iberomaurusian at Tamar Hat in Algeria ~20 KYA [23]. The earliest Iberomaurusian sites in Morocco appear to be only slightly younger ~18 KYA [24]. Whilst backed bladelet production is broadly shared across the different regions of North and East Africa, there was also a level of regional cultural diversity during this period, possibly mirroring a diversification of populations. The Sahara Desert expanded considerably during the LGM, perhaps concentrating human groups along the North African coastal belt and the Nile Valley. Climatic conditions improved in North Africa ~15 KYA, marking the beginning of a dramatic arid-to-humid transition [25]. This increase in humidity may have opened up ecological corridors, connecting North and Sub-Saharan Africa and allowing population dispersals between the two regions. An additional arid-humid transition occurred at 11.5–11 KYA [25]; this period coincides with a widespread change in the archaeological record that marks the beginning of Capsian lithic technologies. The Capsian is argued to have developed in situ in North Africa, marking a continuity from the Iberomaurusian and Oranian into the Capsian [21, 24, 26].

Given the geographical specificity of mtDNA haplogroups U6 and M1, some studies have investigated their potential implication in the peopling of North Africa [5, 2730], whilst some earlier studies assumed that M1 diverged from other M lineages prior to the early dispersals of Homo sapiens out of Africa ~60–70 KYA [14, 15]. However, most research that has followed explains its presence in Africa by a back-migration from Asia [5, 31]. Dating of the U6 and M1 variation in African and Middle Eastern populations has been at the centre of the debate on the timing of the back-migration to Africa and, in particular, whether these haplogroups co-dispersed with certain archaeological cultures or languages. A thorough study by Olivieri and co-authors [29] proposed that both M1 and U6 were involved in an early dispersal, 40–45KYA, from Southwest Asia to North Africa in association with the first arrival of anatomically modern humans in the Mediterranean region. Considering this time frame, it was suggested, furthermore, that the spread of Aurignacian culture in Europe and the Dabban industry in North Africa derived from the same source. This interpretation was questioned by Forster and Romano who, referring to the geographic correlates, questioned this evidence and proposed that, alternatively, the spread of these haplogroups could be potentially be explained by more recent events, perhaps contemporary to the dispersal of populations speaking Afro-Asiatic (AA) languages [32].

In this study, we re-evaluate the timeframe for M1 and U6 variations and their patterns of geographic spread at the resolution of complete mtDNA sequences using a range of phylogeographic and statistical methods. We try to assess to what extent the phylogeographies of U6 and M1 are correlated with each other and, indirectly, with the spread of AA languages. In order to address these questions, a survey of more than 5700 mtDNAs was undertaken, covering a broad geographic region encompassing North and East Africa, the Near and Middle East and the Caucasus. 24 M1 and 33 U6 complete mtDNA sequences were determined and, with the refined phylogenetic trees for M1 and U6 drawn, we use this information to genotype a further 131 M1 and 91 U6 samples of different geographic origin.

Results

Phylogeny, phylogeography and coalescent estimates of M1 and U6

Our genotyping of haplogroup U6 and M1 defining markers, analysed in combination with published data, confirmed earlier findings that these two haplogroups are present all over the Mediterranean Basin: both are particularly prevalent in the southern Mediterranean and M1 reaches as far away as East Africa (Figures 1a and b). Yet, some of their peak frequencies only partially overlap in Northwest Africa. In contrast to high frequencies of M1 sub-clades, haplogroup U6 is rare in East/Northeast Africa and the Middle East, and is virtually absent in the Caucasus (Table 1). Nevertheless, both haplogroups are by and large confined to the area where AA languages are spoken nowadays, being rare or absent in areas where other language families are dominant (Figure 1).

Table 1 Frequency of Haplogroups M1 and U6 in the geographic regions from this study
Figure 1
figure 1

Spatial distribution of haplogroup M1 and U6, with languages’ phyla. Frequency maps were obtained using Surfer 8 (Golden Software, Inc.). The Kriging procedure was used and the dataset was congregated with existing ones [29] and updated with the present study, as well as the data in [27, 28]. Figure 1a: frequency map for haplogroup M1. Figure 1b: frequency map for haplogroup U6. Red dots indicate the populations geographic locations.

Concerning the estimated coalescent ages, Table 2 shows an excerpt of the Additional file 1, and contains only some coalescent ages relevant in a broader context, whilst Figure 2 shows a schematic tree of M1 and U6 phylogenies (See Additional file 2, Additional file 3, Additional file 4 and Additional file 5 for detailed phylogenies). The use of a different method (e.g. using only the synonymous mutations rather than all the mutations present in the mtDNA coding region; see [33]) for estimating molecular coalescent ages gives younger results than previously published [2729] for both haplogroups with the coalescence of U6 at ~35 KYA and M1 at ~29 KYA. U6 is mostly prevalent in Northwest Africa (Additional file 4 and Additional file 5), a similar occurrence for M1b, which contrasts with M1a, the most diverse sub-clade of M1, for which most of its sub-clades are prevalent in East Africa. Both M1b and M1a have close coalescent ages around the LGM: ~20 and ~21 KYA respectively. M1a1 is the most diverse clade of M1a and is found in virtually all the populations where M1 has been sampled (except in Guinea-Bissau). Again, a variety of its sub-clades are more frequent in East Africa and, interestingly, a large subset of M1a1 samples could not be ascribed to any of its known sub-clades (Additional file 3). It is noteworthy to point out that all the Caucasian samples fall into just one sub-clade, M1a1b2, with no variation present at the intermediate level of resolution (Additional file 3), signature of a likely founder effect.

Table 2 Coalescent age estimates for M1 and U6 and some of the most frequent sub-clades
Figure 2
figure 2

Schematic tree of Haplogroup M1 and U6. The tree, rooted in L3, shows the major sub-haplogroups of M1 and U6. The branching is phylogenetically correct, but the branches length is not accurate.

The most diversified sub-clade of U6 is U6a, largely due to the richness of its sub-clades in Northwest Africa. One of its sub-clades, U6a2, has been so far detected only in East African and Middle Eastern populations. Contrary to M1, various clades of U6 predate the LGM, including U6a, which is very close to the overall age of U6 (~33 KYA vs. ~36 KYA). Confirming some previous observations [27, 29, 30], U6b and U6c were confined in our samples to Northwest Africa.

Bayesian Skyline Plot analyses

We tested our panel of full sequences for expansion signal(s) using Bayesian Skyline Plots (BSP) that estimate past effective population size (Ne) dynamics on the basis of sequence data [35]. The method does not rely on any pre-specified parametric model of demographic history. However, its results should be taken with caution, as the curve representing Ne could also reflect changes in the sub-structure of the population rather than its true size variation [36, 37], and that the reconstruction of Ne might also be biased by the purifying selection acting on the mtDNA genome [33, 34, 38, 39]. Yet, as here both lineages have a similar ratio of non-synonymous to synonymous mutations (0,63), this effect is not likely to explain differences that we have found. Figure 3 displays the BSPs for M1 and U6. For each simulation, the median of the other haplogroup is overlaid for comparison. We also indicate the coalescent ages and the 95% CI of some sub-clades based on the full genome as in [34], hence the coalescent ages reported in this section may differ with the ones from the previous section. The rate by Soares et al.[34] is applied here as the entire mtDNA genomic sequence is used for the BSP analyses, whereas the rate by Loogväli et al.[33] applies only to the coding region. Nonetheless, the two different approaches offer similar estimates (see Table 2 and Additional file 1 for more detail).

Figure 3
figure 3

Bayesian Skyline Plot for Haplogroups M1 and U6. The BSPs show the variation of the Effective Population Size (Ne) through Time for M1 (Figure 3a) and U6 (Figure 3b) based on the full mitochondrial genomes. The axis scales are identical for both plots. For comparison, the median of the second haplogroup is shown in grey, but not the 95% HPD. Overlaid on the plots are the coalescent ages of some relevant sub-haplogroups, with the vertical bars indicating the calculated coalescent ages (using the calculator from [34]) and the horizontal ones their 95% confidence interval.

For U6, the initial expansion seems to more or less coincide with the ~26–27 KYA estimated coalescent age (based on full sequences) of U6a, the most diverse and prevalent sub-clade of U6. This expansion appears to have continued at a somewhat equal rate, gradually slowing down, until the curve even drops slightly, and eventually a new expansion phase takes place around ~6–7 KYA. For M1, the slope of the curve is steeper, with two clearly visible expansion phases. The first inflexion is ~22 KYA, slightly older than the estimated coalescent ages for M1a and M1b, with a strong increase until reaching a plateau at ~15 KYA. The second phase occurs at ~10–11 KYA, a time around which the estimated coalescent ages of various sub-clades of M1 fall (e.g., M1b1 and M1a1b). By directly comparing the median curves of U6 and M1, representing the past population dynamics extracted from the molecular data, it appears unlikely that the demographic histories of these haplogroups entirely overlap, both in terms of the timing of expansion phases, as well as the magnitude of these expansions.

Mantel correlation tests

To explore whether the frequencies of M1 and U6 across a geographic range of populations correlate with languages we used Mantel correlation tests. Notably, when M1 and U6 are grouped, or with U6 alone, no significant correlation is found, neither between genes, nor geography, nor language (Table 3). A correlation is found both between geography and language only for M1, being higher with geography than with language. To see which M1 clade contributes the most to this signal, the tests were done with M1a and M1b sorted separately. No correlation could be found between M1b and geography and/or language, whilst M1a was significantly correlated both with geography and language.

Table 3 Mantel test to assess the correlation between genes and geography and/or language

Discussion

Origins of M1 and U6, their implications in the colonisation of North Africa, and some of its archaeological landmarks

A Southwest Asian origin has been proposed for U6 and M1 [2729]. Yet, this claim remains speculative unless some novel “earlier” Southwest Asian-specific clades, distinct from the known haplogroups, are found in which the described so far M1 and U6 lineages are nested. Claims for basal mutations shared with M1 have recently been made in the case of haplogroup M51 and M20 (both East Asian-specific clades [40, 41]): They share a root mutation (C14110T) with M1. However, one should be cautious with phylogenetic inferences drawn from these findings because this mutation is not unique in the phylogeny of mtDNA: it also occurs in the background of non-M haplogroups and therefore identity by descent within haplogroup M remains uncertain. Unfortunately, the sampling of extant populations of Africa and West Asia may not solve the question of their origin.

Assuming that M1 and U6 were introduced to Africa by a dispersal event from Asia, it would be difficult to accept their involvement in the first demographic spread of anatomically modern humans around 40–45 KYA, as suggested by Olivieri et al. (2006), [29] who associated these two clades with the spread of Dabban industry in Africa. It has indeed been previously suggested that the colonisation of North Africa from the Levant took place during the early Upper Paleolithic, as marked by the “Dabban” industry in North Africa [42]. However, comparison of early Upper Palaeolithic artefacts from Haua Fteah and Ksar Akil does not support the notion that the early Dabban of Cyrenaica is an evidence of a population migration from the Levant into North Africa [43]. Marks [44] also noted differences between the two areas in terms of the methods of blade production, further arguing against a demographic connection between the regions. Likewise, the new coalescent date estimates for M1 obtained in this study are not compatible with the model implying the spread of M1 in Africa during the Early Upper Palaeolithic, 40–45 KYA.

Given the sequence data from 242 complete sequences and genotype data of 222 mtDNAs, we were unable to find conclusive evidence that any of the geographic regions of Africa or the Middle East would stand out as being uniquely or even significantly enriched with deep-rooted clades of U6 and M1 not found elsewhere. Whilst several U6 sub-clades seem to be confined to Northwest Africa, this pattern may be the result of drift and founder effects over many thousands of years and does not necessarily suggest that Northwest Africa was the geographic source of U6 dispersals in Africa. Similarly in the case of M1b1, the Northwest African frequency pattern is apparent, whilst its counterpart, M1a, is widely spread around the Mediterranean Basin, and its current diversity is highest in East Africa. The age estimates of M1b and U6a1 (~20 KYA) together with their Northwest African-spread patterns are more consistent with their appearance during or after the spread of the Iberomaurusian culture, rather than explainable by an earlier spread of the Dabban industry. Furthermore, there is no evidence that the Dabban industry spread to NW Africa, as indicated earlier [43, 44]. When taking the most recent common ancestor estimates of mtDNA haplogroups at face value and comparing them with relevant archeological horizons, then the Capsian culture also appears to be a possible candidate for the co-spread of sub-clades U6b and M1b1.

Although mtDNA is a single locus, some parallels concerning the African expansion of M1 and U6 can be drawn from autosomal data. In a recent study, Behar and colleagues explored the genome-wide diversity of the Jewish Diaspora with regard to that of their host populations, as well as the Middle East [45]. In their supplemental figure four, results of analyses undertaken with the software ADMIXTURE are shown, and specifically at K=10, an ancestry component depicted in deep purple colour appears. Interestingly, its proportion is particularly high amongst Mozabite Berbers, who have very high frequencies of M1 and U6 [12]. This deep purple colour is also present at a fairly high frequency amongst Moroccans, and to a lesser extent amongst Ethiopians, both Jewish and non-Jewish, and Egyptians. Its proportion in the Near Eastern populations is by far smaller than in the African ones.

Mimicry of M1 and U6

A mimicry between U6 and M1 has been suggested [28, 29]. Both are likely derived from a non-African ancestral clade at a similar time depth and both are largely confined to North and East Africa and the Middle East in their present-day geographic distribution. It seems, however, that the mimicry breaks down when analysing in further detail the coalescent times and frequency patterns of their sub-clades. Even at the general level, U6 is hardly found outside Northwest Africa, whilst M1 is ubiquitous throughout North Africa, East Africa and the Middle East, reaching also northern Caucasus. The coalescent age for U6a is almost 10 000 years older than that for either M1a or M1b, and most of its sub-clades coalesce before or around the LGM. In contrast, most of the estimates for M1a and M1b sub-clades are post-LGM. Also, the BSP analyses show that M1 and U6 have probably experienced different molecular histories. While the curves representing the median Ne for U6 and M1 overlap when taking the 95% HPD into account, the median curves themselves do differ. The earlier age of U6 is apparent, and though the U6 median follows a rather steady rate until declining, M1 bears testimony to two distinct expansion events. Although Hg U6 also experienced two expansion events, they do not superimpose on those of M1. It should be noted that the U6 curve should be taken with precaution as close to one half of the full U6 sequences used are from Europe. When taking into account the geography and running the BSP simulations by separate regions, it appears that the decline around 8–9 KYA is actually almost entirely driven by the European sequences (See Additional file 6). Unfortunately, it was not possible to ascertain if some of the signals present for M1 are also regional, because the number of regional sequences is too low. However, the proportion of “geographic outliers” in M1 is lower than in the case of U6.

M1, U6 and the Afro-Asiatic language family

It has been proposed that M1 and U6, or some of their sub-clades, could be linked with the spread of AA languages [27, 29, 31]. Some of the main criteria for this are due to their geographical spread broadly overlapping with regions where AA languages are spoken today. There are currently two hypotheses about where AA languages originated. One places it in Northeast Africa, on the coast of the Red Sea [46, 47], linking the reconstructed proto-Afro-Asiatic vocabulary to pre-Neolithic cultures in the Levant and their predecessors in southeast Egypt and northeastern Sudan, whilst the second places it in the Levant [48] , and emphasises the Neolithic component in the Afro-Asiatic cognates. Notably, even the earliest time frame (~10 KYA or more) considered by the linguists [47, 49] for the earliest splits in the language family are more recent than the ages of U6 or M1 and their major sub-clades. However, if the sub-clades of M1 and U6 were to be involved in the dispersal event associated with the Afro-Asiatic languages they had to exist at the moment of the launch of this event, and therefore the fact that these sub-clades are older makes them plausible candidates for such dispersal. However, when considering M1 and U6 as a whole, or U6 alone, no correlation with language (and geography) was found with the current data, indicating for U6 that its expansion was not concomitant with that of the AA.

Concerning haplogroup M1 individually, a significant correlation with languages was observed. Furthermore, within M1, it appears that the correlation is mostly due to M1a. However, given the small sample size of M1b, any potential signal correlating with language might not be detectable. Interestingly, M1a has a likely East African origin, but its coalescent age of ~21 KYA still largely predates that of the proto-AA. Maybe a sub-clade of M1a would still give a similar correlation, but there are not sufficient samples to allow splitting M1a into its various sub-clades, and to test for a correlation. Although we found a correlation, limited sample sizes do not allow drawing unambiguous connection between genes and languages. Furthermore, it is also possible that this putative sub-clade of M1 does not testify for the expansion of AA speaking people, but was already present among the people who inhabited the area before the spread of the AA languages.

Conclusions

Our analyses do not support the model according to which mtDNA haplogroups M1 and U6 represent an early dispersal event of anatomically modern humans at around 40–45 KYA in association with the spread of Dabban industry in North Africa as proposed earlier [28, 29]. A West Asian origin for these haplogroups still remains a viable hypothesis as sister clades of U (and ancestral to it, macro-hg N (including R)) and M are spread overwhelmingly outside Africa, notably in Eurasia, even though the phylogeographic data on extant populations do not present a clear support for it. Our estimates of coalescent times and demographic analyses of U6 and M1 variations suggest that their spread in North and East Africa is largely due to a number of demographic events, predominantly occurring at the end of the peak of as well as after the LGM, but largely before the Holocene. Hence, some of the topologically earliest sub-clades of U6 and M1 may have been involved in the origin and spread of the essentially North African Iberomaurusian culture, and the observed correlations with languages make it likely that the North and East African carriers of the two matrilineages have been absorbed into the expanding Afro-Asiatic languages-speaking people in the area, but in phylogeographically differential ways.

Methods

Samples

From over 5700 samples spanning Europe and countries around the Mediterranean Basin and beyond, 153 M1 and 121 U6 samples were identified based on their HVSI variation and then confirmed by RFLP (all unrelated individuals, who gave their informed consent). Samples from the literature/GenBank were retrieved, including: 77 M1 (2 from [50], 1 from [51], 3 from [52], 8 from [28], 1 from [53], 2 from [39], 1 from [54], 2 from [55], 2 from [56], 51 from [29], 1 from [57], 3 from [58] and 3 from [59]); and 93 U6 (1 from [60], 6 samples from Family Tree DNA deposited in GenBank, 1 from [61], 2 from [53], 1 from [62], 12 from [27], 30 from [29], 2 from [57], 39 from [30] and 7 from [58]). 9 samples were corrected (See Additional file 7 for the corrected positions) compared to their current GenBank entry at the time of this article’s submission, including 2 from [28], 1 from [55] and 5 from [27] (Dr. Vicente Cabrera’s personal communication). Also, 2 M1 and 3 U6 samples were kindly provided by Family Tree DNA (with some U6 samples having a potential match to sequences deposited in GenBank, see Additional file 7 and its legend for more details), bringing the total to 236 and 230 samples for M1 and U6 respectively (See Additional file 7 for detailed information). All the work complied with the Helsinki Declaration of Ethical Principles (59th WMA General Assembly, Seoul October 2008). The Estonian Basic Research project SF0182474 was approved by the Research Ethics Committee of the Estonian Biocentre.

Sequencing, SNP typing

The 153 M1 samples from this study have been screened for approximately 2 kb of coding region in 4 separate fragments (between nps 700–1080, 6250–6990, 12590–13146, 14750–15580) chosen to cover some SNP-defining sub-clades of M1 based on previous knowledge [16, 55]. 22 samples were fully sequenced following previously published protocol [63], and slightly modified. Based on the tree drawn from 105 full (or nearly full) sequences (Additional file 2), some SNPs have been typed in order to place precisely all the samples on the tree (See Additional file 3 and Additional file 8 for the full typing information).

For the 121 U6 samples, several fragments have been amplified to type SNPs of interest based on the samples’ HVS I information (See Additional file 9 for full typing information), as well as from the tree based on 139 full sequences (See Additional file 4).

Phylogenic tree, network

The trees and network were drawn by hand and checked with Network 4.5.1.0 (http://www.fluxus-technology.com/[64]). If needed, a weighing scheme was used for highly recurrent polymorphisms.

Coalescent age estimates

For the coalescent age calculations, the rho (ρ) statistic and standard deviation were used as in [65, 66] but see [67] for a critical assessment of it. Different rates were used: For the coding region, rate [33] is used, and for the full genome, estimates were calculated with the calculator provided in [34]. For all calculations, 2 M1 samples from [50] and the 3 M1 from [52] were discarded – the first ones were missing several portions of the coding region, and the second ones seemed to exhibit potential sequence errors (See [68] for details). For the full genome calculations, further samples were discarded, as their control region is not reported (M1: 1 from [53] and 2 from [39]; U6: 2 from [53], and 1 from [62]).

Mantel test

The haplotype of each sample was composed of all the polymorphisms detected in the coding region during the genotyping of each haplogroup, with the missing polymorphisms assumed to be similar to the RSRS [58], plus the control region (16024–16400). The HVS II was excluded of the haplotype, as it was only sequenced in some samples and, unlike for the coding region, it cannot be reasonably assumed that a specific polymorphism is absent in a different sample. For some populations the sample size was small, in which case they were grouped with a close geographic neighbour sharing the same language family. If this was not possible, the samples/populations were excluded (See Table 3 for details). The genetic distance matrices were based on Slatkin’s linearised FSTs. Because the language families present in the data are too divergent to rank and order them, we used a binary approach, with populations (or grouped populations) speaking a language from the same language family given a distance of 0, and a distance of 1 otherwise. Mantel tests were done with Arlequin 3.5.1.2 [69], with 100,000 permutations.

Bayesian Skyline Plot

The Bayesian Skyline Plot (BSP) [35] is a graphical depiction of the variation of the effective population size (Ne) through time. BSP analyses were performed with the software BEAST v1.5.4 [70]. The GTR substitution model was used with a gamma distribution, plus invariant sites. An uncorrelated lognormal relaxed clock was applied [71]. The whole mitochondrial genome was used, and runs were performed for 40,000,000 generations with 20 groups. In order to assure that convergence was reached, several independent runs were done (See Additional files 10 and Additional file 11 for M1 and U6, respectively). Also, the impact of the number of groups, which was user defined, was explored (See Additional file 12 and Additional file 13 for M1 and U6, respectively) by increments of 5, from 5 to 50. The axes were converted into their final units (effective population size vs. time) with a rate of 1,695 × 10-8[34] and a generation time of 25 years. But in order to take into account the purifying pressure acting on the whole molecule, ρ was deduced from the data, and then entered into the calculator provided in [34], resulting in a time scale which can be put in comparison with the coalescent ages calculated in the same way. Accordingly, the samples which were not available in full, with some missing parts, or which might suffer from errors (See the paragraph on coalescent age estimates) were not included in the analyses. Additional file 14 and Additional file 15 show the differences when the overall rate vs. the rate taking into account purifying selection [34] are used for plotting the results; the major impact being upon the time axis, whereas the impact on the effective population size or the overall shape of the curve are minimal.