Major histocompatibility complex (MHC) genes play a critical role in the immune system. MHC molecules present antigens from either intracellular threats (such as viruses and cancerous proteins, via class I molecules) or phagocytosed antigens (such as bacteria and parasites, via class II molecules) to T lymphocytes to initiate an adaptive immune response (Punt et al. 2018). In vertebrates, MHC allelic variation in a population has been linked to biological traits from immune recognition and susceptibility to infectious and autoimmune diseases and to ecological success with mating preferences and pregnancy outcomes (Sommer 2005). For the last remaining member of the family Phascolarctidae, the koala (Phascolarctos cinereus), survival against both endemic disease (from Chlamydia pecorum and potentially koala retrovirus) and population fragmentation/genetic bottlenecking has reached a crisis point (Australia 2011; Hemming et al. 2018). This has recently led to an increased focus on studying of MHC genetic loci in koalas to understand their potential genetic resilience in the face of these ecological pressures.

There are 23 MHC class I and 23 MHC class II genes and pseudogenes annotated in the koala genome (Johnson et al. 2018). Detailed investigation into class I genes determined that 11 of these genes are actively transcribed in the koala, with three genes ubiquitously expressed as classical class Ia genes (Phci-UA, UB and UC) and eight genes with tissue restricted expressions as nonclassical class Ib genes (Phci-UD, UE, UF, UG, UH, UI, UJ and UK) (Cheng et al. 2018). All of the expressed MHC class I genes appear to be present as single copy genes in the genome (Cheng et al. 2018; Johnson et al. 2018). Within the MHC class II gene family, four class II subfamilies are recognized, consisting of alpha and beta subunits of DA, DB, DC and DM (Abts et al. 2018; Johnson et al. 2018; Lau et al. 2013). Studies investigating the allelic diversity of class II DA and DB genes have found that the beta subfamilies (DAB and DBB) contain more allelic diversity than the alpha subfamilies (DAA and DAA) (Lau et al. 2013). In addition, genome analysis and diversity studies indicate that DAB and DBB genes are present as three distinct loci in the genome, while DCB and DMB genes are present as single copy genes (Johnson et al. 2018; Lau et al. 2013).

Several techniques have been used to identify the allelic diversity of MHC class I and II genes in wild and captive koala populations. Initial studies focused on class II DA and DB genes and utilized single-strand conformation polymorphism (OSCP) analysis (Lau et al. 2014a, 2014b, 2013). OSCP analysis involves PCR amplification of the target loci, lambda exonuclease digestion to remove the forward amplicon stand and acrylamide gel electrophoresis of the reverse amplicon strand to generate a banding pattern and excising of individual bands for direct sequencing or cloning and sequencing to determine allele sequences (Lau et al. 2013). While this approach has the advantage of ensuring all alleles within an individual are detected (via each allele’s unique banding position in the gel), this method is very labour intensive and low throughput. Later studies investigating class I UA, UB and UC genes or class II DA and DB genes opted for the directly cloning and sequencing of PCR amplicons from the target loci (Cheng et al. 2018; Quigley et al. 2018). While this approach improved throughput, it could not guarantee every allele for a tested gene was detected in the set of sequenced clones. Most recently, direct Illumina sequencing of PCR amplicons from the target loci was attempted for a range of class I and II gene loci (Abts et al. 2018). This approach allowed for higher throughput processing and greatly increased confidence that all the allelic diversity within a koala would be detected; however, sequence processing challenges related to amplicon size and multiple genes per loci limited the number of targets that generated reportable data. The field of koala MHC immunogenetics needs a comprehensive approach that combines the advantages of previous studies into a single, reliable, high-throughput technique. That is what this study achieved (Fig. 1).

Fig. 1
figure 1

Flowchart of high-throughput MHC allele determination method in koalas. The left panel summarizes the steps from sample acquisition to sequence generation while the right panel summarizes sequence processing to allele assignment (with example programs/commands necessary to complete each step given in parentheses)

In our current study, the immunogenetic profile of 82 captive koalas was generated in a high-throughput fashion. Blood samples were collected from koalas from Lone Pine Koala Sanctuary (Brisbane, Queensland, Australia) as part of routine health monitoring. DNA was extracted using the DNeasy Blood & Tissue kit (Qiagen) as per manufacturer’s instructions. Established PCR primers that target the receptor binding grove (exon 2 region) of class I UA and UC genes and class II DAB, DBB, DCB and DMB genes (Table 1) (Abts et al. 2018; Cheng et al. 2018; Lau et al. 2013) were used to generate loci-specific amplicons between 200–397 bp. To increase throughput and reduce costs, adaptor sequences were added to the 5′ end of each primer to allow for multiplex barcoding of koala samples for sequencing. The six MHC gene target amplicons from each koala were pooled for barcoding (generating one barcoded sample per koala) and all 82 koala samples were pooled for sequencing on a single MiSeq 250 bp paired end Illumina run (Ramaciotti Centre for Genomics, Sydney) (Fig. 1).

Table 1 PCR primers used to identify MHC alleles

To deconvolute the raw sequencing results into MHC alleles present per koala, the sequences obtained from each koala were sorted, merged into complete amplicon sequences, processed to extract highly repeated sequences and identified against a database of known koala alleles (Fig. 1). Sequence files from each koala were first separated into individual target gene files based on the PCR primer sequence for each target gene, trimmed to remove the primers sequences and culled to remove any reads shorter than 150 bp using the program cutadapt (Martin 2011). Next, paired forward and reverse reads were merged to reassemble the complete amplicon sequence using the program FLASH (Magoc and Salzberg 2011). The sequence data was then converted from Fastq to Fasta format using the standard unix ‘sed’ command. Finally, sequences were BLAST searched against a list of known koala MHC alleles using stand-alone BLAST (Altschul et al. 1990; Camacho et al. 2009). For each gene target, reference alleles that represented more than 10% of the total sequence reads for that gene were considered present in the koala. To detect novel MHC alleles not represented in the reference list, sequence files were separately tested for highly repetitive sequences with the program prinseq (Schmieder and Edwards 2011) and novel sequences were added to the reference list (Fig. 1).

Using this high-throughput method, the allelic diversity of six MHC genes was determined for all 82 test koalas (Fig. 2). Overall, this population contained seven UA alleles (six novel), five UC alleles (three novel), 10 DAB alleles (one novel), eight DBB alleles, three DCB alleles (all novel) and four DMB alleles (all novel) (Fig. 2). Within these alleles, the expected range of 1 to 2 alleles per koala was retrieved from the single genome copy genes UC, DCB and DMC and 1 to 6 alleles per koala were retrieved from the three genome copy genes DAB and DBB. Interestingly, between 1 and 3 alleles per koala were retrieved from UA (a single copy gene). Sequence comparison revealed that the detected UA alleles designated UA*08:01 and UA*09:01 were identical to the previously published UB alleles UB*04:01 and UB*03:01, respectively (Fig. 2). This suggested that the UA PCR primer set was amplifying both UA and UB alleles, and both gene loci are represented in the UA allele results. Phylogenetic analysis of class I sequences supported the fact that UA and UB alleles are closely related, preventing segregation of alleles into UA or UB gene origin (Fig. 2a).

Fig. 2
figure 2

Phylogenetic relationships of known koala MHC gene alleles from class I (a) and class II (b). These maximum likelihood phylogenetic trees were generated using DNA sequences aligned with mafft (Katoh et al. 2002) before ModelFinder determined the best fit model (HKY + F + G4 for (a); TIMe + G4 for (b) (Kalyaanamoorthy et al. 2017)) and IQ-TREE (Nguyen et al. 2015) and UFBoot2 (Hoang et al. 2018) constructed the tree with 1000 bootstrap replicates. Only bootstrap values above 70 are shown. Alleles highlighted in blue were detected in this study. The accession number for each allele is presented in parenthesises after the allele name

Within the captive koala study group, there were 18 family units (dam-sire-joey) and an additional 18 parent-offspring pairs (either dam-joey or sire-joey). This allowed for a detailed evaluation of the accuracy of this high-throughput koala immunogenetic approach (Tables 2 and 3). Knowing that MHC alleles must follow Mendel’s law of segregation of genes (offspring must inherit an allele from each parent) and that the genetics of an offspring should be composed of the genetics of their parents, the accuracy of allele assignment within the family groups was determined. Examining all 36 family units, there was 100% congruency between the parent(s)/joey genetic profiles for UA, DCB and DMB loci, 94% (34/36) congruency in DAB and DBB profiles and 86% (31/36) congruncy in UC profiles (Tables 2 and 3). Within these 216 comparisons, the nine inconsistent cases involved five cases where the joey was missing an allele from either their dam or sire (four UC, one DBB), three cases where the joey possessed an allele neither parent possessed (two DAB, one DBB) and one case where the joey did not have an allele from either parent (UC). Overall, these minor discrepancies resulted in this method having an overall congruency rate of 96%.

Table 2 Immunogenetic profiles of dam-sire-joey family groups
Table 3 Immunogenetic profiles of parent-joey family pairs

To examine the immunogenetic diversity within this koala captive population, MHC haplotypes were clustered in R (R_Core_Team 2014) based Gower’s coefficient of similarity (Gower 1971) in daisy and with complete linkage in hclust (Fig. 3). After sampling was done for this study, eight koalas developed cancer (primarily lymphoma or leukemia) and another six koalas died of natural causes (age-related). To determine if there were any associations between developing cancer and MHC alleles, both combined haplotype and individual allele prevalence were examined in this subset of deceased koalas. While there was no significant difference detected in the overall MHC haplotypes of these koalas (χ2 = 23.946, df = 29, p = 0.7316) (graphically seen in the lack of clustering by causes of death in Fig. 3), allele DBB*03 was significantly more prevalent in koalas that developed cancer (5/8; 63%) than koalas that died of natural causes (0/6; 0%) (Fisher exact p = 0.031). It should be acknowledged that association of an MHC allele with neoplasia provides no evidence of causation, as this outcome could be related to another linked genetic or retroviral trait. As the sample size in this analysis was relatively small, monitoring will continue in these koalas and reanalysis will be undertaken when sample sizes are larger.

Fig. 3
figure 3

MHC haplotype clustering of captive koalas in this study. Koalas that developed cancer (primarily lymphomas and leukemias) are indicated in red with red stars, while koalas that died from natural causes (related to old age) are indicated in blue with blue hearts

In conclusion, this study designed and tested a high-throughput protocol to determine the MHC allelic profile of koala classic class I and class II beta subfamily genes. Using established PCR primer sets, standard Illumina paired end sequencing and freely available software, this method resulted in 96% congruence of allele assignment within 36 koala family units over six MHC loci. Alleles detected in this study expanded the list of known koala MHC alleles, and an association between the presence of DBB*03 and koalas developing cancer was detected. This protocol offers a reliable method for expanded study in the important area of koala immunogenetics.