Introduction

Nowadays, the cynomolgus macaque (Macaca fascicularis) represents an important animal model in preclinical biomedical research, especially in studies of immune-related diseases as AIDS (Wiseman et al. 2007; Greene et al. 2008; Benferhat et al. 2009; Mee et al. 2009), tuberculosis (Reed et al. 2009), and malaria (Edstein et al. 2007), but also in autoimmune diseases such as multiple sclerosis (Ma et al. 2009) and in transplantation research (Aoyama et al. 2009). Therefore, a thorough knowledge of the major histocompatibility complex (MHC) of this Old World monkey species is a prerequisite. The -DRB region in various primate species displays allelic variation (polymorphism) as well as diversity (variation of gene copy numbers and composition; Bontrop 2006). In humans, the number of DRB loci per haplotype varies from one to four and five major region configurations with different gene numbers and content are known (DR8, DR1, DR51, DR52, and DR53), whereas in chimpanzees nine and in rhesus macaques more than 30 region configurations have been defined with up to five and six DRB loci per haplotype, respectively (Mayer et al. 1992; Khazand et al. 1999; Doxiadis et al. 2007; de Groot et al. 2009). Many of the DRB alleles of rhesus (Mamu-DRB) and cynomolgus monkeys (Mafa-DRB) belong to loci/lineages that are shared between humans and macaques: namely, DRB1, DRB3, DRB4, and DRB5, as well as DRB6, with the latter appearing to be a pseudogene in all primate species studied. In addition, loci/lineages for which no human equivalent is known are present in macaques. These are named DRB*W, and various DRB*W loci/lineages have been defined. In humans, the highly polymorphic DRB1 gene is present in each region configuration, whereas macaques also possess region configurations without a DRB1 gene or even with duplicated DRB1 genes. Within a given region configuration, DRB1 genes in macaques display low or limited polymorphism (Doxiadis et al. 2000). In rhesus and cynomolgus macaques, two to three DRB loci per haplotype appear to be expressed. Untranscribed Mamu- and Mafa-DRB alleles may belong to different loci/lineages; even DRB1 alleles have been observed without a transcript (de Groot et al. 2004; Blancher et al. 2006).

In previous studies, the number of Mafa-DRB genes was defined to vary from two to four per haplotype (Blancher et al. 2006; Doxiadis et al. 2006a; O'Connor et al. 2007). In concordance with data obtained from mtDNA, Y-chromosomes, and different autosomal markers (Smith et al. 2007; Tosi and Coke 2007; Blancher et al. 2008; Bonhomme et al. 2008), the Mafa-DRB region in animals from Mauritius displays limited levels of polymorphism/diversity (O'Connor et al. 2007; Wiseman et al. 2007; Wojcechowskyj et al. 2007), which is due to a founder effect. However, the DRB region configurations of animals originating from Indochina and the Indonesian islands seem to be far more variable (Leuchte et al. 2004; Wei et al. 2007; de Groot et al. 2008). We were keen to determine whether the region configurations reported so far provide a comprehensive picture or has only “the tip of the iceberg” been observed. Therefore, we made use of the complex repeat, D6S2878, mapping to intron 2 of all DRB (pseudo)genes, which are characterized by an intact exon 2–intron 2 organization. Previous studies revealed that this microsatellite (DRB-STR) is present in various primate species (Riess et al. 1990; Epplen et al. 1997; Bergstrom et al. 1999; Kriener et al. 2000; Bak et al. 2006; Doxiadis et al. 2007). Genotyping of panels of humans (Doxiadis et al. 2007, 2009), chimpanzees (de Groot et al. 2009), and rhesus (Doxiadis et al. 2007) and cynomolgus macaques (de Groot et al. 2008) allowed the definition of unique haplotyping patterns in all four species. In the present study, a large panel covering related and unrelated cynomolgus macaques was analyzed. Samples were first subjected to 12S rRNA mtDNA sequencing to probe the geographic origin of the monkeys. Subsequently, DRB haplotyping was performed, followed by sequencing of all unreported Mafa-DRB alleles.

Materials and methods

Samples and genomic DNA isolation

For this study, DNA samples of 162 related and 68 unrelated cynomolgus macaques were analyzed. The related animals belong to an outbred breeding colony that is housed at the Biomedical Primate Research Centre (BPRC), The Netherlands, and are members of 11 pedigreed families with variable member sizes and generations, ranging from eight to 30 animals and from two to six generations. The DNA of the unrelated animals of unknown origin was a gift derived from various sources. Genomic DNA from the breeding group animals was extracted from EDTA blood samples or from immortalized B cell lines using a standard salting out procedure.

mtDNA analysis

mtDNA was obtained as described above or was extracted from feces in 96% ethanol using the QIAamp DNA stool mini kit (QIAGEN, GmbH, Germany) according to the manufacturer's recommendations. Amplification of part of the mitochondrial 12S rRNA gene, purification, and sequencing was performed essentially according to published methods (Doxiadis et al. 2003). The data were analyzed using the SeqMan program of the Lasergene software (DNASTAR, Madison, WI, USA). The seven unreported sequences (0201345, F108, Hoffa, Friko, 2321A, 6224, and 479) resulting from at least two independent PCR reactions have been deposited in the EMBL database (accession numbers: FN434196–FN434202). All other mtDNA sequence accession numbers have already been published (Tosi et al. 2003; Doxiadis et al. 2006a; de Groot et al. 2008).

Phylogenetic analysis of mtDNA sequences

Multiple sequence alignments of mtDNA sequences of the 12S rRNA part were created using MacVectortm version 10.6.0 (Oxford Molecular Group). The evolutionary history of different sequences together with published mtDNA sequences of cynomolgus macaques of known origin was inferred using the neighbor-joining method. The bootstrap consensus tree inferred from 1,000 replicates is taken to represent the evolutionary history of the taxa analyzed. The evolutionary distances were computed using the maximum composite likelihood method (Tamura et al. 2004) and are in the units of the number of base substitutions per site. There were a total of 370 positions in the final dataset. Phylogenetic analyses were conducted using MEGA version 4.0 software (Kumar et al. 2008).

STR-DRB genotyping

Amplification of the relevant DNA segment in cynomolgus macaques was performed as described for rhesus macaques using the same primer sets (de Groot et al. 2008). Briefly, the relevant DNA segment in rhesus macaques was amplified with a forward primer located at the 3’ end of exon 2 next to intron 2 (5'Mamu-DRB-STR: TTC ACA GTG CAG CGG CGA GGT) and with labeled reverse primers in intron 2 (3'Mamu-DRB-STR_VIC: ACA CCT GTG CCC TCA GAA CT and 3'Mamu-DRB-STR_FAM_1007: ACA TCT GTG TCC TCA GAC CT). The labeled primers were synthesized by Applied Biosystems (Foster City, USA) and the unlabeled primers by Invitrogen (Paisley, Scotland). The PCR reaction was performed in a 25-μl reaction volume containing 1 unit of Taq polymerase (Invitrogen, Paisley, Scotland) with 0.6 μM of the unlabeled forward primer (5'Mamu-DRB-STR), 0.4 μM of the VIC labeled reversed primer, 0.2 μM of the FAM labeled reversed primer, 2.5 mM MgCl2, 0.2 mM of each dNTP, 1×PCR buffer II (Invitrogen, Paisley, Scotland), and 100 ng DNA. The cycling parameters were a 5 min 94°C initial denaturation step, followed by five cycles of 1 min at 94°C, 45 s at 58°C, and 45 s at 72°C. The program was followed by 25 cycles of 45 s at 94°C, 30 s at 58°C, and 45 s at 72°C. A final extension step was performed at 72°C for 30 min. The amplified DNA was prepared for genotyping according to the manufacturer's guidelines and analyzed on an ABI 3130 genetic analyzer (Applied Biosystems). STR analysis was performed using the Genemapper program (Applied Biosystems), and all samples were analyzed at least twice.

PCR, cloning, and sequencing of DRB exon 2

One hundred and eighteen different Mafa-DRB alleles were sequenced from exon 2 to intron 2, including the microsatellite. Therefore, we used the same primers and PCR reaction as described for rhesus macaques, and cloning and sequencing was also performed as published earlier (Doxiadis et al. 2007). The 28 unreported Mafa-DRB sequences have been deposited in the EMBL database [accession numbers: FN433698, FN433701, FN433704, FN433705, FN433708, FN433712, FN433716, FN433719-FN433721, FN433725, FN433729, FN433731, FN433734, FN433737-FN433745, FN433747, FN433749-FN433752] (Supplementary Table 1) and are officially designated by the IPD/MHC database (Klein et al. 1990; Robinson et al. 2003; Ellis et al. 2006).

Phylogenetic analysis of DRB exon 2 sequences

Multiple sequence alignments of exon 2 of all known Mafa-DRB sequences were created using MacVector™ version 10.6.0 (Oxford Molecular Group), followed by a phylogenetic analysis performed with the MEGA version 4.0 software (Kumar et al. 2008) as described above for mtDNA, except that pairwise distances were computed using the Kimura-2 parameter model, and a total of 183 positions were included in the final dataset. The bootstrap consensus tree inferred from 2,000 replicates is taken to represent the evolutionary history of the taxa analyzed.

Results and discussion

Geographic origin of cynomolgus macaques

The 3' segment of the 12S rRNA gene of mtDNA provides essential information to elucidate the geographic origin of different macaque populations (Tosi et al. 2003; Smith and McDonough 2005; Kyes et al. 2006; de Groot et al. 2008). The obtained mtDNA sequences, as well as reference alleles extracted from animals of known origin, were subjected to phylogenetic analysis. The resulting consensus tree showed a bifurcation of continental versus insular lineages (Fig. 1), a result comparable to a preceding study conducted on a limited number of samples (Doxiadis et al. 2006a) and on a report based on parts of the 16S rRNA gene of mtDNA and two genomic loci mapping to the Y chromosome (Tosi and Coke 2007). The cynomolgus macaques from the self-sustaining breeding colony seem to have originated mainly from the Malaysian/Indonesian islands and partly from the Indochinesian continent (Fig. 1, bold). Half of the unrelated animals, however, have their roots in the islands and the other half in the continent (Fig. 1, bold and italics). A split is observed within the continental mtDNA lineages, separating cynomolgus macaques from north of the Isthmus of Kra, including Cambodia, Thailand, and Vietnam, from those of south of the isthmus: namely, the Malaysian peninsula. mtDNA obtained from the unrelated animals from colonies of different sources cluster within both branches, whereas three of the families of the self-sustaining breeding colony—represented by Sayonara, Alfa, and Cornea (Fig. 1, bold)—originate from the Malaysian peninsula. The phylogenetic tree shows far more diversity within the continental mtDNA samples than within those representing the islands. Although no divergence times can be calculated, this observation is in concordance with the suggestion of a southwards migration of the cynomolgus macaques from mainland Indochina to the Indonesian islands (Blancher et al. 2008). Moreover, mtDNA phylogeny shows that the analysis of mtDNA of unrelated animals obtained from different sources further broadens the variety of cynomolgus macaques with regard to their maternal origin.

Fig. 1
figure 1

Phylogenetic tree displaying different mtDNA sequences of cynomolgus macaques. mtDNA sequences of cynomolgus macaques of the test panel in comparison to those of cynomolgus monkeys of known origin; mtDNA sequences from rhesus macaques of Indian, Chinese, and Burmese origin are used as outgroup. Brackets indicate the geographic clusters. Names of cynomolgus monkeys from families are indicated by bold letters, names of unrelated animals are bold and italics. Bootstrap values <50 have been omitted

Mafa-DRB allele and haplotype definition

All 162 pedigreed macaques included in this study are members of 11 families covering two to six generations and belong to a self-sustaining breeding colony. Therefore, haplotypes of the family members could be determined by segregation analyses. Furthermore, DRB haplotypes of the 68 unrelated animals, for which segregation data are absent, were deduced if the combination of certain DRB-STR patterns and exon 2 alleles was found to be shared by at least two animals.

As has been shown previously (de Groot et al. 2008), all Mafa-DRB genes with an intact exon 2–intron 2 segment possess the relevant microsatellite, and the majority of the appropriate DNA segments could be successfully amplified. Amplification failure was observed in a few cases, mostly related to members of the DRB6 pseudogene family. The failure is likely caused by primer inconsistencies due to the genetic instability of this pseudogene (Doxiadis et al. 2008a). In each cynomolgus macaque sample, two to ten amplicons that are highly variable in length with a range from 143 to 362 bp (Table 1) were detected. If a combination of amplicons was not observed earlier, cloning and sequencing of exon 2 segments of the respective sample was performed to define unreported alleles and to link each DRB-STR length to its respective DRB gene/allele. In the cohort of 230 animals, a total of 118 DRB alleles could be determined, including 28 unreported ones (Supplementary Table 1; Table 1, bold). The Mafa-DRB exon 2 sequences have been subjected to phylogenetic analysis, and a neighbor-joining tree has been constructed (Fig. 2). The tree shows many branches with relatively short branch lengths, with the exception of one deep clade representing the alleles of the DRB6 pseudogene. In all cases, the DRB-STR amplicons could be unambiguously linked to a certain DRB allele and were found to segregate in a Mendelian manner. The composition of the DRB-STR is in keeping with the phylogeny of exon 2 sequences (Doxiadis et al. 2007; de Groot et al. 2008). Since the microsatellite is highly variable in length and present in most DRB genes/alleles analyzed, each haplotype is characterized by a unique DRB-STR pattern representing the combination of certain DRB alleles. In such a way, 49 different region configurations could be defined differing in number and content of their DRB genes/pseudogenes (Table 1). Twenty-two configurations have been described earlier to be present in four of the 11 families housed at the BPRC (de Groot et al. 2008). Two of the previously reported configurations turned out in fact to be only one encoding a total of five DRB alleles (Table 1, No 15). A few of the 49 DRB region configurations have also been observed in monkeys of other primate centers (Leuchte et al. 2004; Blancher et al. 2006; Wei et al. 2007; Wiseman et al. 2007). Additionally, another configuration described in Mauritius monkeys seems to be identical to one of our cohort (no. 28), with the addition of a DRB6 allele. Thus, this study resulted in the discovery of 28 unreported Mafa-DRB region configurations. Only four of these configurations display allelic variation (Table 1, no.10a, b; 20a–c; 33a, b; 44a, b,) resulting in a total of 54 DRB haplotypes. Most of these unreported haplotypes were detected within the group of unrelated animals, although these monkeys are characterized by only nine different mtDNAs, and no breeding information was available. The finding suggests that many more DRB haplotypes may be detected if animals from other origins are going to be analyzed. Thus, the extremely high level of DRB region configuration-associated diversity in cynomolgus monkeys most likely represents a species-specific strategy to cope with various pathogens.

Table 1 Mafa-DRB haplotypes defined by exon 2 sequencing and DRB-STR genotyping
Fig. 2
figure 2

Phylogenetic tree of all known Mafa-DRB exon 2 sequences. Phylogenetic analysis of exon 2 sequences of all 118 Mafa-DRB alleles has been performed as described above (Material and methods); exon 2 sequence of Caja-DRB*W1601 is used as outgroup. Bootstrap values <50 have been omitted. Alleles as for example DRB*W2001 that do not cluster within other alleles of the same lineage have been named according to the same motif at the peptide binding site (amino acids 9–13)

Generation of DRB region configurations by recombination-like events: a possible role of the DRB6 pseudogene

In humans, only five major DRB region configurations are known and designated, DR8, DR1, DR51, DR52, and DR53, (Marsh et al. 2005), whereas in chimpanzees nine (de Groot et al. 2009), and in rhesus macaques, mostly of Indian origin, about 30 different configurations have been defined (Slierendregt et al. 1994; Khazand et al. 1999; Doxiadis et al. 2000). A particular New World monkey species, the common marmoset (Callithrix jacchus), however, appears to lack region configuration polymorphism at all (Antunes et al. 1998; Doxiadis et al. 2006b). Earlier publications suggested that the formation of the DRB region in most primates by extension and contraction resulted from unequal crossing-over events (Slierendregt et al. 1994; Doxiadis et al. 2000). However, the high level of DRB region configuration polymorphism encountered in cynomolgus macaques: namely, 49 configurations detected in only 230 animals, is unprecedented. In contrast, each of the five human region configurations is extremely polymorphic, mainly due to the HLA-DRB1 gene, which displays abundant levels of allelic variation. As a consequence, many haplotypes have been established for each of the five HLA-DR region configurations. The degree of allelism within a region configuration is far lower in rhesus macaques and is seldom seen in the cynomolgus monkey. In the latter species, the number of region configurations almost equals the number of haplotypes. Thus, depending on the species, the number of DRB region configurations and the degree of allelic polymorphism appear to be in reversed proportion to each other.

About half of the Mafa-DRB region configurations appear to share segments covering one or more alleles/loci, and examples have been given (Table 2). In one of these examples, not only the exon 2 sequences but also the lengths of the adjacent STRs are identical (Table 2, 2a/b). Other identical sets of exon 2 alleles segregate with slightly different STR lengths, a result that is in concordance with the notion that the STRs evolve faster than the adjacent coding sequences (Doxiadis et al. 2007; de Groot et al. 2008). Many of these haplotype pairs that share certain DRB genes/alleles appear to have additionally a DRB6 pseudogene in common (examples are given in Table 2). Furthermore, it is noted that 36, thus nearly three-quarters of the 49 DRB region configurations, contain one and sometimes even two DRB6 pseudogenes. All contemporary primate DRB genes are thought to originate from an ancestral progenitor gene and to have arisen from several rounds of duplication (Bontrop et al. 1999). One of the duplicated progenitor genes appears to be the founder of the DRB6 gene/pseudogene, which is more than 58 million years old, since it is also present in prosimians but seems to have been lost in New World monkeys. The phylogenetic tree confirms the DRB6 pseudogene as being an old entity, since all Mafa-DRB6 alleles cluster far apart from the rest of the DRB genes/alleles (Fig. 2), and Mamu-DRB6 sequences are shown to cluster together with DRB6 alleles from other species such as humans and chimpanzees (Doxiadis et al. 2008b). Additionally, the DRB6 gene must have lost its function, at least its ability to encode a bona fide MHC class II gene product, very early in the evolution of the DRB region because it is known to be a truncated pseudogene in nearly all contemporary living primate species. Why such a highly polymorphic pseudogene has been kept in a multigenic region like the MHC over such a long time span is an intriguing question. The pairs of Mafa-DRB region configurations that share partly identical genes/alleles (Table 2) suggests the possibility that a recombination-like event occurred that was promoted by DRB6 itself or its surrounding region. These recombinations seem to have happened as unequal crossing-over events, since some of the resulting Mafa-DRB regions have duplicated DRB6 loci in both pairs (Table 2, no. 1) or only one of them contains the duplicated DRB6 (Table 2, no. 4a). In addition, there are related sets of region configurations of which only one has a DRB6 pseudogene (Table 2, no. 6a, 7b, and 8a), but there are also configurations with DRB6 as the only shared locus (Table 2, no. 9). A possible explanation for the recombination hot-spot at the DRB6 locus may be the presence of a more than 5,000 bp long endogenous retroviral sequence, HERVK3I, within the intron 1 of all DRB6 and DRB2 pseudogenes studied so far in humans, chimpanzees, and rhesus macaques (Doxiadis et al. 2008a). HERV structures are well-known to promote recombination and sequence-transduction processes (Deininger and Batzer 2002; Kazazian 2004), and their possible role in the contraction and expansion of the DR region has been suggested in the past (Andersson et al. 1998). Especially those retroviral structures like the HERVK3I sequences of the DRB6/DRB2 lineage, which are integrated in sense direction in intronic sequences, are described to promote recombination-like processes (van de Lagemaat et al. 2006; Doxiadis et al. 2008a). The observation that the common marmoset, a New World monkey that has no DRB6/DRB2 gene/pseudogene(s) and therefore also no HERVK3I insertion, also lacks region configuration polymorphism fits into this hypothesis. A recent report documenting the existence of hybrid DRB region configurations in humans in which unequal recombination-like events appear to have taken place surrounding DRB6 pseudogenes also supports this theory: namely, that endogenous retroviral insertions and probably other transposable elements play a role in the plasticity of the DR region (Doxiadis et al. 2009). However, the existence of only nine different region configurations in the chimpanzee, in nearly all of which a DRB6/DRB2 locus is present, seems at first glance to contradict a unique, recombination-promoting role of intronic HERVs. One possibility could be that the chimpanzee as a species is far younger than the Old World monkeys and has not had enough evolutionary time to generate a higher number of region configurations. Furthermore, it is known that about 2 million years ago chimpanzees experienced a selective sweep that targeted the MHC region (de Groot et al. 2002). Therefore, it is likely that chimpanzees lost some DRB region configurations due to this selective sweep. Another scenario could be that in one of the first rounds of unequal crossing-over processes other genes were generated in cynomolgus macaques, such as those of the DRB*W lineages that are not present in hominoids. In humans, there is only one functional DRB gene, DRB1, which is present on all haplotypes and therefore can probably not be missed, whereas in macaques some of the DRB*W genes may have replaced the DRB1 gene as the prominent, beta-chain-encoding DRB locus. In chimpanzees, a species older than humans in evolutionary terms, the intermediate situation can be observed, since there exists at least one haplotype without a DRB1 gene, and it is plausible that DRB3 and/or DRB4 have taken over its function (de Groot et al. 2009). Therefore, it seems likely that in the far future more region configurations will be generated in humans and great apes, with a subsequent loss of DRB1 as the main and most polymorphic DRB gene per chromosome.

Table 2 Arising of new Mafa-DRB haplotypes by recombination-like events