Background

The Atlantic slave trade resulted in the forced migration of an estimated 11 million Africans to the Americas. Only 9 million are thought to have survived the passage, and many more died in the early years of captivity. Historical accounts indicate that virtually all enslaved Africans brought to North America came from either West or West Central Africa. A recent comparison of mtDNA sequences from 1148 African Americans living in the US with a database of African mtDNA sequences showed that more than 55% of the US lineages have a West African ancestor, while fewer than 41% came from West Central or South West Africa [1]. In North America, different constellations of African groups were brought to various staging areas [2]. Among the important staging areas for the arrival and distribution of enslaved Africans were the ports of Savannah, GA and Charleston, SC. Estimates of the origin of enslaved Africans received at these sites are presented in Figure 1, with the largest African regional contributions coming from West Central Africa (40%; contemporary Angola, the Congos, Equatorial Guinea, and Gabon), and the West African regions of Senegambia (23%; contemporary Senegal, Gambia, and northern Guinea), and Upper Guinea (18%; contemporary Guinea and Sierra Leone and northwestern Liberia). Africans in the Carolina coast region were intentionally mixed to reduce the possibilities for successful revolts and to facilitate their assimilation into plantation-slave society. The contemporary Gullah/Geechee culture emerged from these Africans.

Figure 1
figure 1

Proportions of enslaved Africans brought to historic Carolina coast ports from the 17th to 19th centuries CE (from Jackson, 2004 [2]).

Because mitochondrial DNA (mtDNA) is passed from mother to daughter with few, if any, changes occurring over many generations, it is possible to compare contemporary African-American mtDNA haplotypes with contemporary mtDNA haplotypes in a worldwide database to obtain information about the ancestral origins of these mtDNAs. In such a comparison, continent-specific haplotypes are readily observed, and the assignment of mtDNAs to continent of origin is relatively straightforward. The more difficult task is to tie particular mtDNA haplotypes to specific geographical regions and ethnic groups within a continent. This task is particularly difficult for Africa, as there is more genetic diversity among Africans than among people from any other continent and because humanity has resided in Africa longer than anywhere else.

Comparisons of individual mtDNA haplotypes could be used to identify a geographical region, particular country, or even an ethnic group from which a maternal ancestor emanated. However, the geographic distribution of mtDNA haplotypes is greatly influenced by the migration of individuals or population groups. These movements often result in the assimilation of people from other ethnic groups. Intermarriage also causes mtDNA haplotypes to move from one ethnic group to another. Over time, mtDNA haplotypes that originated in a single ethnic group are distributed among many ethnic groups. Despite these complications, mtDNA analyses for the purposes of ancestry reconstruction are increasing in popularity. Many people have had their mtDNA tested with the hope that the test will match their DNA to an mtDNA haplotype found in a particular ethnic group. For African Americans, who have been disenfranchised from their specific African roots, such a test might provide a clue about the ethnic group or country in Africa where one of their maternal ancestors originated. However, if identical mtDNA haplotypes are shared among many ethnic groups from different parts of Africa, it would be impossible to use DNA sequence information to determine which single ethnic group was the source of a particular maternal ancestor. To date, there are no published assessments that provide quantitative information about how often African-American mtDNAs are exact matches to multiple African ethnic groups. Therefore, we decided to compare samples of Carolina coast and other African-American mtDNAs to a database of sub-Saharan African mtDNAs to generate such an assessment.

Results

Database characterization

We assembled a database of 3645 mtDNA HVS-I sequences from the published literature and 80 additional sequences from our own (unpublished) studies of ethnic groups in Mali to generate a database of 3725 sequences. Only sequences from sub-Saharan Africa were included in the database, because North African mtDNAs are quite different from sub-Saharan mtDNAs [1] and few North American slaves are thought to have come from North African countries. Within the sub-Saharan database, more than 50% of the sequences were identical to a sequence from at least one other ethnic group. The remaining sequences either occurred multiple times within a single ethnic group or occurred only once in the database.

To provide a regional analysis of the database, samples were assigned to geographic regions as shown in Table 1 and Figure 2, and the percentages of within-region and among-region matches were determined. The West African region contributed 1528 (41%) of the sequences (Table 2). The sizes of the other regional groups ranged from 127 to 995. Overall, 40% of the sequences were present only once in the database or were found multiple times within a single ethnic group. In contrast, 24% of sequences were found in multiple ethnic groups from at least three geographical regions.

Table 1 Definition of geographic regions.
Figure 2
figure 2

Map depicting the geographic locations and the regional groupings of the population samples used in this study.

Table 2 Characteristics of the sub-Saharan mtDNA HVS-I database.

Two of the regional groupings, East and South, had an excess of sequences that were found in a single ethnic group, and a corresponding deficit of matches to sequences from multiple regions. This result is consistent with the idea that these two regions are dominated by samples that have low levels of the mtDNA haplotypes that are characteristic of the Bantu [4, 5]. In contrast, the majority of mtDNA sequences from Mozambique in the Southeast region match sequences from multiple regions, and only a small percentage of these sequences are unique to ethnic groups from Mozambique, perhaps reflecting the fact that only Bantu speakers were sampled [5, 6]. In support of this idea, most matches that include sequences from only two regions involve the West Central region that is believed to have been the original Bantu homeland [7].

Comparison of African-American samples with the sub-Saharan databases

Two African-American samples, a sample of African Americans who self-identified as Gullah/Geechee and a sample of African-American DNAs obtained from the Armed Forces DNA Identification Laboratory (AFDIL), were compared with both the original and the expanded databases to provide a sense of how increasing the database size impacts the distribution of exact matches. The Gullah/Geechee people are an African-American microethnic group residing in the Georgia/South Carolina Lowcountry and coastal islands whose numbers are now estimated between 200,000 and 500,000 in the Sea Islands of South Carolina, Georgia, North Florida, and beyond [8]. Gullah/Geechee language and culture include unique practices and artefacts (e.g., coiled basketry, Brer Rabbit stories, praise houses) including a distinct linguistic style with roots among the Mende peoples of Sierra Leone, West Africa. When a sample of 74 Gullah/Geechee mtDNA sequences was compared with the sub-Saharan database, approximately half of the mtDNAs were identical to two or more mtDNAs in the database and only seven mtDNAs matched mtDNAs from a single ethnic group (Table 3). The remaining 28 mtDNAs were not identical to any sequence in the expanded database.

Table 3 Number of perfect matches to African-American HVS-I sequences.

Similar results were obtained when the 97 African-American AFDIL mtDNAs were compared with the databases. Approximately half (49) of the mtDNAs were identical to multiple sequences in the original database (Table 3). As with the Gullah/Geechee sample, fewer than 10% of the sequences matched a sequence from a single ethnic group, and 40% of the sequences did not have any perfect match in the database.

When the unmatched AFDIL and Gullah/Geechee mtDNAs were combined and analyzed further, 63% differed from a database sequence at a single nucleotide position (Table 4). Nearly three-quarters of these imperfect matches were to sequences that were found in multiple ethnic groups. Thus, most of the imperfect matches appear to be derived from the common haplotypes by a single mutational event.

Table 4 Imperfect matches to the Gullah/Geechee and AFDIL African-American HVS-I sequences.

Geographical distribution of database matches

The majority of African-American mtDNAs that were identical to database mtDNAs matched mtDNAs from ethnic groups that were scattered throughout sub-Saharan Africa. However, 41% of the Gullah/Geechee and 37% of the AFDIL mtDNAs that matched database sequences were identical to mtDNAs found only in western (West plus West Central) Africa (Table 5). Only one Gullah/Geechee mtDNA and one AFDIL mtDNA matched mtDNAs that are found exclusively in eastern Africa in the sub-Saharan database. This distribution of matches is consistent with the historical information that most North American slaves were originally from western Africa. Most of the single region matches to both the Gullah/Geechee and the AFDIL mtDNAs occurred with West African samples (Table 6). This result is consistent with the historical records indicating that West Africa was a major source of American slaves, but it also probably reflects the fact that the West African samples made up 41% of the expanded database. Surprisingly, five AFDIL mtDNAs matched only mtDNAs from the two Angolan samples that make up 4% of the database. This result is consistent with historical records indicating that a large proportion of the enslaved Africans brought to the Americas came from the West Central African region of Angola/Congo region, and suggests that ethnic groups in this region of Africa need to be sampled more extensively.

Table 5 Geographical source of mtDNA HVS-I matches.
Table 6 Distribution of single region matches.

Language group comparisons

Considering Africa's geographical size and population density, and the duration of human residence on this continent, linguistic diversity at the taxonomic level of family is amazing low. This low level of linguistic diversity is probably the consequence of protracted mobility and interaction among Africa's indigenous groups, facilitated by the longstanding presence of such organized political-social units as kingdoms and empires and such sociocultural practices as polygamy.

Among the AFDIL sequences with more than five matches to various African ethnic groups, most language diversity was within the various subfamilies of the Niger-Congo family. These subfamiliesinclude Atlantic Congo (e.g., the ethnic groups Fula, Yoruba, Wolof, Balanta) and Mande (e.g., the ethnic groups Mandingo, Mende, Bambara). However, in some of the sequence matches, different linguistic families were represented altogether, including the Afro-Asiatic (e.g., the Tuareg ethnic group) and Nilo-Saharan (e.g., the Dinka ethnic group) families, along with members of the Niger-Congo family.

The most extensive pan-African haplotype (16189 16192 16223 16278 16294 16309 16390) is in the L2a1 haplogroup. This sequence is observed in West Africa among the Niger-Congo family including the Malinke, Wolof, and others; in North Africa among the Afro-Asiatic family including the Hausa and others; in Central Africa among the Niger-Congo family including the Bamileke and others; in South Africa among the Khoisan family including the Khwe and the Niger-Congo family Bantu speakers; and in East Africa among the Niger-Congo family Kikuyu. Closely related variants are observed among the Afro-Asiatic family including the Tuareg in North and West Africa and among the East African Nilo-Saharan family Dinka. Thus, identical mitochondrial haplotypes are often shared among ethnic groups with considerable language diversity.

Discussion

Because only a small fraction of the sub-Saharan African ethnic groups have been sampled, and there are parts of sub-Saharan Africa that are poorly represented in our database (Figure 2), our database cannot be considered a representative subset of the sub-Saharan mtDNA gene pool. Nevertheless, it is clear that a much larger database is needed since 40% of the African-American samples analyzed have no exact match in our database. The extensive sharing of mtDNA haplotypes among ethnic groups from different regions of Africa is consistent with the historical evidence of extensive migration and mixing of African ethnic groups. Indeed, the well-documented Bantu migrations appear to have had a major impact [4], as have the formation of the historic empires and kingdoms of the region (such as the historic empires of Ghana, Mali, and the Songhai, Bakongo, and Ashanti Kingdoms). Despite the limitations of our database of sub-Saharan mtDNA sequences, it is likely that we have identified the most common haplotypes found in this region. Some are found throughout the region that includes the Bantu migrations, and others are found primarily in either the western or the eastern parts of the continent. We intend to continue to increase the size of our database, because a significantly larger database would provide more information about haplotypes that are present at lower frequencies than the most common haplotypes. Some of these lower-frequency haplotypes are likely to be shared among widely distributed ethnic groups, while others may have a more localized distribution.

Another way to assess our sub-Saharan mtDNA database would be to see how well African-American mtDNAs match database sequences. Historical accounts of the trans-Atlantic slave trade indicate that most North American slaves came from the western coast of Africa, including the geographical regions from present-day Angola to Senegal. When African-American mitochondrial DNA HVS-I sequences were studied, nearly half were identical to those from two or more African ethnic groups in our expanded database. Furthermore, the average number of perfect matches per matching African-American mtDNA increased from 3.6 different ethnic groups to 6.1 different ethnic groups when the size of the database was increased by 53% to its present size of 3725 sequences. These results reflect the fact that approximately half the mtDNA sequences in our sub-Saharan database are shared by members of three or more ethnic groups.

In both of the African-American samples, approximately 40% of the mtDNA sequences did not match any sequence in any other ethnic group (Table 3). However, more than half of these sequences differed from multiple database sequences at a single position (Table 4). Because it is unlikely that more than a few of these differences result from new mutations that occurred in North America or that more than a few lineages went extinct in Africa after being introduced to the new world, this result suggests that only a small fraction of the mtDNA diversity present in sub-Saharan Africa has been sampled, and that much of the unsampled diversity is due to single mutations that have occurred in the common haplotypes.

Many African Americans are interested in learning more about their African roots and are willing to pay to have their mtDNA analyzed in the hope that it will match DNA from a particular African ethnic group. However, as more than half of the mtDNA sequences in the African database are identical to sequences from other ethnic groups, African-American mtDNAs will be much more likely to match sequences from multiple ethnic groups than sequences from a single ethnic group. When this result is coupled with the fact that 40% of African-American mtDNAs did not match any sequence in the database, it is clear that matches to a single African ethnic group will not be the outcome for most African Americans, and even when a match to a single ethnic group is obtained, multiple matches may occur in a larger database. Furthermore, for the typical African American, the maternal ancestor who was the source of the mtDNA was just one of hundreds of enslaved African ancestors. In fact, it likely that there has been more mixing of African ethnic groups in the Americas than has ever occurred elsewhere. Thus, the ancestors of virtually all contemporary African Americans came from a large number of ethnic groups located throughout the region from Senegal to Angola.

Conclusion

Half of the sub-Saharan mtDNA sequences in our database are common haplotypes that are shared among ethnic groups from multiple regions of sub-Saharan Africa. The finding that fewer than 10% of African-American mtDNAs matched mtDNA sequences from a single African region suggests that as few as one in nine African Americans may be able to trace their mtDNA lineage to a particular region of Africa. However, no firm conclusions should be made until a much larger database is available. It is clear, however, that nearly half of contemporary African-American mtDNAs are identical to African haplotypes that are found in multiple ethnic groups throughout sub-Saharan Africa. For these mtDNAs, it is impossible to use only mtDNA sequence information to determine which single ethnic group was the source of the maternal ancestor.

Methods

African-American samples

A sample of 78 African Americans who self-identified as Gullah/Geechee was generated by our laboratories from unrelated people sampled in the coastal areas of South Carolina and Georgia using either cheek swabs or mouthwash to collect buccal cells. DNA was isolated using a BuccalAmp DNA Extraction Kit (Epicentre, Madison, WI) for the cheek swabs or a DNAzol procedure (Molecular Research Center, Cincinnati, OH) for the mouthwash samples. The HVS-I region was amplified and sequenced as described previously [3]. Those mtDNAs with non-African haplotypes, three with Native American haplotypes (two haplotype B and 1 haplotype A2) and one with European mtDNA (haplotype H) were excluded from further analysis (Table 9). A second sample of 104 African-American mtDNA sequences was obtained from Tom Parsons at the Armed Forces DNA Identification Laboratory. In this sample, mtDNAs with non-African haplotypes (five haplotype H, one haplotype J, and one haplotype U4) were excluded.

Table 9 Gullah/Geechee mitochondrial DNA HVS-I sequences included in this study.

Database assembly

A database of 3725 mtDNA HVS-I sequences from people living in sub-Saharan Africa was assembled from the published literature in October 2005 (Table 7) with the addition of 80 new mtDNA sequences from people belonging to the Malinke and Bambara ethnic groups in Mali (Table 8). DNA from these latter samples was isolated using a BuccalAmp DNA Extraction Kit (Epicentre, Madison, WI) from cheek swabs obtained from unrelated volunteers. MtDNA HVS-I sequences from two African-American population samples were then compared with these databases to determine how often individual HVS-I sequences are identical to African HVS-I sequences in the databases. For these comparisons, only sequences from 16030 to 16420 were considered, and both insertions and differences at positions 16182 and 16183 were ignored. In addition, a change to 16390A was inferred for all L2 haplogroup sequences that did not include this mutation. No attempt was made to correct any other errors that might be present among the published sequences. However, the presence of sequencing errors would have the effect of reducing the incidence of perfect matches so that the frequencies of perfect matches we observe should be considered minimum estimates. Matches to multiple individuals within an African ethnic group were considered a single match. Sequences included in the databases are available from Bert Ely.

Table 7 Mitochondrial DNA HVS-I sequences included in this study.
Table 8 Malinke and Bambara mitochondrial DNA HVS-I sequences included in this study.