Re-creation of the genetic composition of a founder population
- 505 Downloads
Human ethnic groups are frequently comprised of two or more founder populations. One of these founding populations is often available for contemporary sampling. We describe a method for reconstructing the composition of a missing founder population using the highly informative haplotypes comprising the HLA system. An application of the method is demonstrated using bone marrow registry samples of African Americans. We use contemporary samples of African Americans and European Americans to derive haplotypes of the West African founder populations. This approach may also be useful for reconstructing ancestral haplotypes for regions elsewhere in the genome.
KeywordsFounder Population Admix Population Haplotype Inference Admixture Proportion Locus Haplotype
During human history, the process of spreading over first Africa and then the other continents divided humans into more or less discrete populations defined by distinct cultural practices. Periods of separation, selection and population isolation laid conditions for varying degrees of genetic differentiation (e.g. Johansson and Gyllensten 2008). The re-contact and admixture of these more or less discrete groups has resulted in new distinct populations comprised of two or more founding groups. Such admixed groups often constitute well-recognized ethnic categories in countries throughout the world.
Historically, recent admixture from populations of differing continental origins defines such groups in the USA and Brazil, for example. Yet the admixture of groups from intra-continental sources has also been a familiar aspect of human population history. For example, the largest human population group, the Han Chinese, have been shown to consist of distinct subpopulations reflecting diverse origins, partial isolation and subsequent admixture (Hu et al. 2007; Chen et al. 2008). Even within national boundaries, European populations are also the product of waves of invasions and subsequent admixture reflected in their current genetic composition. As we demonstrate here, it is straightforward and unambiguous to reconstruct greatly diverged parental populations.
The HLA system, including the loci of the human major histocompatibility complex at 6p21, comprises the most polymorphic system in the human genome. It evolves rapidly and as such constitutes an excellent marker system for identifying and following the parental population contributions to contemporary admixed groups. Given the HLA-typed samples of an admixed population and one of its founder populations, a method to identify the HLA haplotypes of the missing founder population is described and applied.
Model for division of samples
In this formulation, the admixture rate M is estimated from outside sources. The preferred sources for admixture estimates are genomewide studies with pure parental population controls (e.g. Price et al. 2007; Risch 2006). The geographical scale of the estimate should be considered. For example, African American composition has been shown to vary across the United States, with African Americans from the US South having a higher fraction of African background genetics. Large-scale estimates can be achieved by averaging a set of small-scale studies in different locales to get a nationwide average, but care should be taken when using estimates from smaller population and sample locations.
While the vast majority of haplotypes are private to different continental populations, some haplotypes are found in both populations, but at differing frequencies. Our method handles haplotypes both shared and private.
Sampling variation may result in negative frequencies when the product of the admixture proportion M and the frequency of the haplotype in population PA exceeds the frequency of the haplotype in the admixed population PN. In these cases, the haplotype is estimated not to exist in PB, so the frequency of the haplotype in PB is set to zero. After the initial determination of all haplotype frequencies, PBi, all haplotypes are normalized to sum to one.
Selecting pure ancestral population samples
Estimating admixture from HLA data
In admixed populations with more than two founder populations, such as Caribbean Hispanic populations that have a mixture of African, Native American and European ancestry, the same method can be applied to calculate the frequencies of a single missing founder population when all the other founder populations have been characterized and admixture estimates provided for each component.
The National Marrow Donor Program in Minneapolis, MN maintains a donor registry including individuals of African Americans and of European Americans. A total of 1,000 individuals from each registry group were randomly selected. We utilized National Marrow Donor Program data recently reported in the literature (Maiers et al. 2007).
HLA typing and haplotype inference
HLA typing was performed at the antigen or two-digit level of resolution at the loci HLA A, HLA B and HLA DRB1 using DNA methods. Three locus haplotypes, for example A*32-B*42-DRB1*03, are abbreviated 32:42:03.
For haplotype inference, we used standard methods adjusted to accommodate the large haplotype diversity present in the HLA system. We applied the expectation-maximization (EM) algorithm to infer three locus HLA haplotypes from genotypes. Estimation of frequencies of rare haplotypes in founder populations is highly prone to error. There are several sources of possible error. Inadequate sampling of populations results in frequencies that have a wide error bounds due to statistical variation in sampling a small proportion of the overall population. Estimation error is an artifact of the EM algorithm where rare haplotypes in the sampled populations are difficult to ascertain due to lack of information. Admixture estimation error affects the frequencies of the missing founder population calculation based on the accuracy of the admixture proportion into the admixed population. Some haplotypes may have been created by recombination or mutation after the merging of the two founder populations. This method assigns these haplotypes to the missing founder population.
The method of derivation of founder population haplotypes can be demonstrated with HLA-typed samples of European Americans and African Americans. African Americans are derived from West Africans and Europeans in the proportions of approximately 80:20 (e.g. Zhu et al. 2005). This example is especially informative for these purposes because of the great (intercontinental) divergence in HLA haplotypes between the peoples of Africa and Europe (Mack and Erlich 2006). Infrequent but genuine haplotype similarities or the possibility of low levels of African Admixture in the European American sample (Shriver et al. 2003) will not detract from the utility of this example because of the substantial differences between the two founding populations. In order to estimate the HLA haplotype frequencies of the West African founder populations, we took samples of 1,000 individuals (2,000 haplotypes) typed at the “antigen level” (2-digit) for African American and European American donor samples from the National Marrow Donor Program registries (Maiers et al. 2007).
The most frequent ten West African HLA haplotypes similarly arranged and compared with the frequencies of the African American and European American haplotypes are shown in Fig. 2b. The most common West African haplotype is A*30-B*42-DRB1*03, present at a frequency of 0.021. African American frequencies averaged 92% that of the estimated West African frequencies. The two European American haplotypes observed in the West African sample were quite rare and may be due to haplotype estimation errors with frequencies of only 0.00054 and 0.00026 for haplotypes 23:15:11 and 74:15:13, respectively.
Although a fuller description of founder populations estimated from African Americans and other groups will be presented separately, some points are worth making at this time. This example, comparing differences in HLA frequencies between two continental regions, suggests that there may be complete population differentiation in HLA types at the continental level, with little or no sharing of haplotypes. Further underlining this point, the two-digit antigen level of HLA typing resolution presented here often contains a great deal of additional allelic variation, which can make a sizeable contribution to haplotypic variation. For example, the common alleles seen in Europeans, B*44 and DRB1*15, each consist of dozens of subtypes. An additional source of further HLA haplotype variation is present in the other histocompatibility loci also present in the HLA complex. We suggest that samples typed at high resolution and at addition HLA loci would further reduce instances of haplotype overlap between European and African source HLA haplotypes.
Historically admixed populations have gained attention in recent years because of their potential for admixture mapping of disease genes (Smith and O-Brien 2005; Patterson et al. 2004; Wang et al. 2008; Xu et al. 2008). Our goal in this contribution is to demonstrate a method to re-create parental populations of an admixed group, when one of the parental populations is available, especially pertinent to HLA information. The HLA region of humans is composed of the highly polymorphic major histocompatibility loci distributed over a region of 3–4 Mb. The high diversity of this region is much greater than the sum of the allelic variation from each of the 8–10 histocompatibility loci.
Population samples of HLA frequencies derived by this method can be of value in several respects. First, one or more of the founder populations of a contemporary group will often be unavailable or impossible to sample, making the reconstructed samples of unique value. In addition, a population’s HLA composition is an essential starting place for determining the sampling requirements for an ethnically specific bone marrow or stem cell registry, and in understanding the practical side of population differentiation for patient–donor matching. It is possible to stratify admixed groups based on inference of their HLA haplotypes coming from two ancestral sources or a single population. Patients with mixed ancestral HLA will be among the least likely to find a match because population samples with similar ancestral mixtures may be difficult to obtain.
This work describes a method of reconstructing the haplotype frequencies of a founder population. For this purpose, we use only a relatively small sample of available population data (1,000 European Americans and 1,000 African Americans) and limit the description of haplotypes from the derived population. A more complete and thorough study founder HLA haplotypes from African American and other admixed populations will be reported separately, and will address relative subtleties of the method such as the adequacy and purity of an available founder population (e.g. European Americans) and present more substantial lists of derived haplotypes. Another issue to be addressed at that time is the apparent sharing of haplotypes from populations of intercontinental origins.
The evolution and modification of haplotypes of the HLA complex have been studied over many years, yet haplotype blocks present throughout the genome also evolve through the same variety of genetic mechanisms as seen in the HLA system. Our method applies to not just the HLA system, but to other haplotype frequencies in the genome. The HLA system is particularly remarkable for the availability of quality data and the population privacy of its haplotypes. Datasets are often available using other genetic marker systems, raising the possibility for this same type of analysis on SNPs or microsatellites. In fact, forays have already been made into analyzing admixed populations with a variety of genetic systems (Bertorelle and Excoffier 1998; Mountain et al. 2002; Choisy et al. 2004; Pfaff et al. 2004; Price et al. 2007). It appears that the HLA system may one end of the spectrum of population haplotype divergence in humans.
Supported in part by Office of Naval Research Grant N00014-08-1-0058.
This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.
- Mack S, Erlich H (2006) Population relationships as inferred from classical HLA genes. 13th International Histocompatibility Workshop Anthropology/Human Genetic Diversity Joint Report in Immunobiology of the Human MHC. Proceedings of the 13th international histocompatibility workshop and congress. Fred Hutchinson Cancer Research Center Seattle WA, USAGoogle Scholar
- Price AL, Patterson N, Yu F, Cox DR, Waliszewska A, McDonald GJ, Tandon A, Schirmer C, Neubauer J, Bedoya G, Duque C, Villegas A, Bortolini MC, Salzano FM, Gallo C, Mazzotti G, Tello-Ruiz M, Riba L, Aguilar-Salinas CA, Canizales-Quinteros S, Menjivar M, Klitz W, Henderson B, Haiman CA, Winkler C, Tusie-Luna T, Ruiz-Linares A, Reich D (2007) A genomewide admixture map for latino populations. Am J Hum Genet 80:1024–1036PubMedCrossRefGoogle Scholar
- Wang S, Ray N, Rojas W, Parra MV, Bedoya G, Gallo C, Polleti G, Mazzotti G, Hill K, Hurtado AM, Camrena B, Nicolini H, Klitz W, Barrantes R, Molina JA, Freimer N, Bortolini MC, Salzano FM, Petzl-Erler ML, Tsuneto LT, Dipierri JE, Alfaro EI, Bailliet G, Bianchi NL, Llop E, Rothammer F, Excoffier L, Ruiz-Linares A (2008) Geographic patterns of genome admixture in Latin American mestizos. PLoS Genet 4(3):e1000037 Google Scholar