The evolution of the natural killer complex; a comparison between mammals using new high-quality genome assemblies and targeted annotation
- 1.6k Downloads
Natural killer (NK) cells are a diverse population of lymphocytes with a range of biological roles including essential immune functions. NK cell diversity is in part created by the differential expression of cell surface receptors which modulate activation and function, including multiple subfamilies of C-type lectin receptors encoded within the NK complex (NKC). Little is known about the gene content of the NKC beyond rodent and primate lineages, other than it appears to be extremely variable between mammalian groups. We compared the NKC structure between mammalian species using new high-quality draft genome assemblies for cattle and goat; re-annotated sheep, pig, and horse genome assemblies; and the published human, rat, and mouse lemur NKC. The major NKC genes are largely in the equivalent positions in all eight species, with significant independent expansions and deletions between species, allowing us to propose a model for NKC evolution during mammalian radiation. The ruminant species, cattle and goats, have independently evolved a second KLRC locus flanked by KLRA and KLRJ, and a novel KLRH-like gene has acquired an activating tail. This novel gene has duplicated several times within cattle, while other activating receptor genes have been selectively disrupted. Targeted genome enrichment in cattle identified varying levels of allelic polymorphism between the NKC genes concentrated in the predicted extracellular ligand-binding domains. This novel recombination and allelic polymorphism is consistent with NKC evolution under balancing selection, suggesting that this diversity influences individual immune responses and may impact on differential outcomes of pathogen infection and vaccination.
KeywordsNatural killer cells C-type lectin Natural killer complex Leukocyte receptor complex KLRA KLRC
Natural killer (NK) cells are a diverse population of circulating lymphoid cells with cytotoxic and cytokine-secreting functions, particularly in response to intracellular pathogen infections and neoplasms. Although rare, primary NK cell immunodeficiency leads to complications and/or death from severe herpesviral infections, virus-associated tumor growth, leukemia, and mycobacterial infections (Orange 2013). Dysregulated MHC class I expression on nucleated cells, such as during viral infection, is recognized by a diverse repertoire of NK cell surface receptors which mediate their immune functions through direct recognition of equally diverse MHC class I molecules. In mammals, NK cell receptors for MHC class I are encoded within two unrelated and independently segregating gene complexes, the leukocyte receptor complex (LRC) containing genes encoding the killer cell immunoglobulin-like receptors (KIR) and the natural killer complex (NKC) containing multiple members of killer cell lectin-like receptor genes (KLR). Both gene complexes evolve rapidly, vary in gene content within and between species, and can encode both activating and inhibitory polymorphic receptors. Thus, a highly diverse NK cell repertoire containing multiple highly similar receptors allows for a finely-tuned ability to discriminate MHC class I expression between healthy and damaged cells.
The number of KIR and KLR genes is highly variable between mammalian species, with primate and rodent species the best studied to date (Guethlein et al. 2015). Humans and other higher primates have an expanded, highly polymorphic and gene variable KIR locus, and possess four functional KLRC genes (NKG2A, C, E, and F) and a single KLRA (Ly49) gene or pseudogene (Trowsdale et al. 2001; Wende et al. 1999; Wilson et al. 2000). In contrast, rats (Rattus norvegicus) possess a single KIR in their LRC, whereas mice (Mus musculus) possess two KIR genes on chromosome X and which are thought to have alternative functions (Hoelsbrekken et al. 2003). However, rodents have a highly expanded and diverse repertoire of KLRA genes (Anderson et al. 2001; Higuchi et al. 2010) and a unique, yet related KLRH gene (Naper et al. 2002). The primate KIR and the rodent KLRA and KLRH bind classical MHC class I molecules to control NK cell function (Daniels et al. 1994; Daws et al. 2012), a rare example of convergent evolution that illustrates the fundamental importance of this receptor-ligand system.
Beyond rodents and higher primates, a few other species have been studied in some detail. Horses (Equus caballus), for example, possess an expanded KLRA repertoire of five polymorphic genes and a single putatively functional KIR3DL-like gene (Futas and Horin 2013; Takahashi et al. 2004). The mouse lemur (Microcebus murinus) has expanded a different NKC gene family, possessing five functional KLRC and three functional KLRD (CD94) (Averdam et al. 2009). Together, KLRC and KLRD form a heterodimeric pair providing the mouse lemur with a substantially expanded KLRC/KLRD combinatorial repertoire. However, NK receptor diversification is not always a prerequisite for a species survival. Several marine carnivores, (seals and sea lions), possess a single functional KIR and single functional KLRA, while their terrestrial relatives, cats (Felis catus) and dogs (Canis lupus), only appear to have a functional KLRA, with the KIR gene being disrupted or deleted, respectively (Hammond et al. 2009). Pigs (Sus scrofa) also possess a single KIR and a single KLRA, yet it is uncertain if either of these genes are functional (Gagnier et al. 2003; Sambrook et al. 2006). For many of these species, however, the remainder of their KLR repertoire remains unresolved.
Findings to date indicate that cattle (Bos taurus) are unique in having expanded and diversified NK cell receptor genes within both the NKC and LRC (Birch and Ellis 2007; Guethlein et al. 2007; McQueen et al. 2002; Storset et al. 2003). Cattle possess at least seven KLRC, two KLRD, and a single but polymorphic KLRA within the NKC (Birch and Ellis 2007; Dobromylskyj et al. 2009), and eight functional KIR genes in the LRC (Sanderson et al. 2014). However, the characterization of the NKC relies largely on the current public genome assembly (Elsik et al. 2009). Immune gene complexes, however, are often highly repetitive due to the presence of many very similar genes often with small intergenic intervals that are enriched with interspersed repetitive elements. Together with allelic polymorphism, this can create significant assembly problems during whole-genome sequencing attempts. As a consequence, several draft genome assemblies indicate that the NKC has had a complex and dynamic evolutionary history during mammalian radiation, a hallmark of strong positive selection, but the genome sequence of these regions and associated annotation is either preliminary or lacking.
An accurate NKC genome sequence and correct annotation is essential to inform functional genomic studies. In an age of heightened concern for food security, this is particularly important for immunogenetic variation in food-producing species that could be exploited to improve resilience to infectious diseases. To address this, we have improved and confirmed the assembly of the cattle and goat (Capra hircus) NKC using bacterial artificial chromosomes (BACs) and recent long-read genome assemblies (Smith and Medrano, unpublished) (Bickhart et al. 2016). These were newly annotated, as were the available draft reference genomes for the sheep (Ovis aries) (The International Sheep Genomics Consortium et al. 2010), pig (Groenen et al. 2012), and horse (Wade et al. 2009). These were then compared to the well-characterized NKC structures of the rat (Flornes et al. 2010), human (Hofer et al. 2001), and mouse lemur (Averdam et al. 2009), allowing us to propose a model for the NKC evolution during the past approximately 92 Myr. To additionally assess the level of intraspecies variability in cattle and genes under selection, we investigated polymorphism within the NKC of 23 individuals including breeds from both B. taurus and B. indicus, whose wild ancestors began to diverge approximately 2 Ma (Hiendleder et al. 2008) and which were domesticated separately about 10,000 years ago.
Materials and methods
Peripheral blood samples from B. taurus and B. indicus cattle were collected in accordance with the UK Animal (Scientific Procedures) Act 1986 and approved by either The Pirbright Institute Ethics Committee or The Roslin Institute’s Animal Welfare and Ethics Committee. The Chillingham samples were from animals culled for welfare reasons. Blood sampling of Kuchinoshima-Ushi cattle was carried out in accordance with the Regulations for Animal Experiments in Nagoya University and the Guidelines for the Care and Use of Laboratory Animals by the Tokyo University of Agriculture.
The region spanning the NKC from immediately upstream of KLRA to immediately downstream of KLRE was extracted from the current genome builds within Ensembl (Cunningham et al. 2015) for cattle (UMD_3.1, chr 5 99,387,020–100,235,099), sheep (Oar_v3.1, chr 3 203,826,025–204,418,113), pig (Sscrofa10.2, chr 5 64,112,755–64,634,584), and horse (Equ Cab 2, chr 6 37,209,165–38,556,956) and from the National Center for Biotechnology Information (NCBI) for goat (CHIR_1.0, CM001714 91,233,093–91,817,092). Additional scaffolds for goat and cattle were generated using long reads (NCBI accession numbers PRJNA290100 and KX592814, respectively) and represent the first livestock genomes assembled de novo from PacBio reads alone. The specifics of sequence generation, contig assembly, scaffolding, and validation to create the long-read assemblies are extensive and are described elsewhere for goat (Bickhart et al. 2016) and will be for cattle (Smith and Medrano, unpublished). Genes within the NKC builds as well as on individual BAC clones were identified using the Basic Local Alignment Search Tool (BLAST) against GenBank and known NKC genes (Altschul et al. 1990). Exonic structure was informed using BLAST within NCBI against the reference RNA sequence database (i.e., RefSeq_RNA). HMMgene was additionally used to hunt for putative open reading frames (Krogh 1997).
Bacterial artificial chromosomes and sequencing
A BAC library was previously developed from a Friesian dairy bull (Di Palma 1999; Di Palma et al. 2002) and was screened for NKC-containing genes. Primers were designed to amplify KLRC1, KLRD1, KLRD2, KLRJ, and the flanking genes gamma-aminobutyric acid receptor-associated protein-like 1 (GABARAPL1) and serine/threonine/tyrosine kinase 1 (STYK1) (Table S1). A PCR-based screen of 39,936 BAC clones (∼1.5× genome coverage) identified three clones overlapping the NKC region, which are TPI4222-309A12, TPI4222–102F7, and TPI4222-343E2. Despite positive results with whole genomic DNA (gDNA) to verify primer specificity, however, no clones were PCR positive for KLRJ or STYK1. The NCBI genome viewer was queried to identify four additional BAC clones from two animals, which are two BAC clones from a different Friesian bull (RP42–154D6 and RP42–162P15) and two clones from the Hereford bull L1 Domino 99375 (CH240-60G5 and CH240-239G9). The GenBank accessions for the BAC clones used in the present study are TPI4222-343E2 (KX611578), TPI4222-309A12 (KX611577), TPI4222–102F7 (KX611576), RP42–162P15 (KX698608), RP42–154D6 (KX698607), CH240-239G9 (AC170009), and CH240-60G5 (AC156849).
BAC clones were expanded overnight and BAC DNA was purified using the Qiagen Large Construct Kit (Qiagen, GmbH). Purified DNA from three clones (TPI4222-309A12, TPI4222–102F7, and TPI4222-343E2) was sequenced using Illumina MiSeq with 250 × 250-bp paired-end reads (Source Biosciences Inc., Nottingham, UK) and de novo assembled using Velvet (Zerbino and Birney 2008). As the resultant assemblies failed to yield single contigs, the assembled sequences were manually scaffolded and supported by BLAST comparisons of individual contigs. These final, manual assemblies resulted in single contigs for each of the three clones. Band sizes generated by HindIII endonuclease digestion of BAC DNA matched those predicted by assembly, suggesting that the BAC assemblies were accurate.
The remaining clones were sequenced at the USDA-ARS Meat Animal Research Center (Clay Center, NE) using the PacBio RSII platform (Pacific Biosciences of California, Inc.). To further confirm the Illumina assemblies, TPI4222-309A12 and TPI4222–102F7 were re-sequenced in this manner, and these assemblies matched those generated using the Illumina data. Read filtering and assembly were conducted using the Pacific Biosciences SMRT Analysis software (version 2.3.0; http://www.pacb.com/devnet/). The resulting contigs were circularized by comparing the contig ends against the whole contigs to identify overlap, then the cloning vector was identified and removed to produce contigs with the first base being the first beyond the 3′ end of the cloning vector accession sequence (AY487252). Potential errors remaining in the contig sequence were removed by re-mapping all of the subreads to the edited contigs, producing high-quality (<0.01% error) genomic sequences.
Nomenclature, manual annotation, and phylogenetics
Where possible, the Human Genome Organization (HUGO) Gene Nomenclature Committee (HGNC)-approved gene nomenclature is used. For ease of reference, common gene synonyms are also provided upon first usage for many of the NKC genes. KLRC gene subgroup nomenclature is maintained based on the previous identification of KLRC1 and KLRC2-like complementary DNA (cDNA) sequences in cattle (Birch and Ellis 2007). All NKC genome builds and individual BAC clones were manually annotated using Artemis (Rutherford et al. 2000). Where possible, gene structure was informed using cDNA evidence. Pseudogenes were defined based on the presence of frameshifts and premature stop codons that would prevent the production of a functional protein. Repetitive elements were identified using RepeatMasker version open-4.0.5 (Smit et al. 2015). Recurrence plot sequence identity comparisons of genome assemblies were made using DOTTER (Sonnhammer and Durbin 1995) and a sliding window of 200 bp. Predictions of transmembrane (TM) regions were made using TMHMM (Krogh et al. 2001). Alignments of NKC genes were generated using ClustalW (Thompson et al. 1994), and phylogenetic analyses were performed within MEGA6 (Tamura et al. 2013) using maximum likelihood based on the Tamura three-parameter model and the partial deletion method using a 95% cutoff and 100 bootstrap iterations (Tamura 1992). Complete mitochondrial genomes were used to show species divergence (cattle, V00654; goat, KP271023; sheep, AF010406; pig, NC_000845; horse, AB859014; rat, X14848; mouse lemur, NC_028718; and human, AP008824). Divergence times are based on relative branch lengths and estimated using the reported divergence estimates for simians and prosimians (68.2–81.2 Ma (Pozzi et al. 2014)), sheep and cattle (30 Ma (Hiendleder et al. 1998)), and cattle and pigs (60 Ma (Meredith et al. 2011)). Gene structure and organization was compared to human (Hofer et al. 2001), mouse lemur (Averdam et al. 2009), and rat (Flornes et al. 2010), as high-quality assemblies and annotations exist for all these species. Rats were chosen as the representative of rodents, as apart from variability in the number of KLRA, there is little known structural variation between mice and rats (Dissen et al. 2008).
Animals used for SNP analysis
Heparinized peripheral blood was acquired from 15 Friesian cattle (B. taurus) belonging to an MHC defined herd at The Pirbright Institute (Ellis et al. 1999). Semen from two Friesian breeding bulls (Blackisle Garve and Nerewater Tiptop) was purchased from Genus UK. Additional gDNA was obtained from two individuals from the feral Chillingham Park herd (Alnwick, Northumberland, UK), which have been genetically isolated for ∼300 years (Visscher et al. 2001), an individual from a genetically isolated cattle population on Kuchinoshima Island (Japan) (Kawahara-Miki et al. 2011), two Sahiwal cattle (B. indicus), and a single Nelore (B. indicus). The gDNA sourced from each of the B. taurus and B. indicus animals was whole genome amplified using the REPLI-g Mini Kit (Qiagen, GmbH) following manufacturers’ instructions.
Genomic enrichment of cattle NKC
Mononuclear cells (PBMCs) were separated from the heparinized peripheral blood using Histopaque-1083 (Sigma-Aldrich Corporation), and gDNA was isolated using the QIAamp DNA Mini Kit (Qiagen, GmbH) following the manufacturers’ instructions. The quantity of purified gDNA was assessed for each animal with the Quant-iT PicoGreen dsDNA assay (Thermo Fisher Scientific) using the low-range standard curve. An aliquot of gDNA from every animal was sheared using a Covaris S220 Focused-ultrasonicator (Covaris, Inc.). Instrument parameters provided by the manufacturer were used to fragment the DNA to insert sizes between 500 and 650 bp. Indexed paired-end gDNA libraries were constructed using a low-throughput, low-sample number TruSeq DNA sample preparation kit (Illumina, Inc.). Four multiplexed sequencing libraries were prepared, one for each of four independent genome enrichment experiments. Each multiplexed library was constructed using 1-μg input DNA per animal, and ligation products were size selected (>500 bp) on an agarose gel and purified as described in the TruSeq protocol. An aliquot was removed from each and used as PCR template to assess the quality of the constructed sample library. PCR amplification was carried out as described in the TruSeq DNA Sample Preparation Guide. Sample library quality assessment was carried out by running PCR products on a DNA1000 chip using a 2100 Bioanalyzer instrument (Agilent Technologies). Amplification of each multiplexed sample library was performed as described in the NimbleGen SeqCap EZ Library SR User’s Guide (version 4.1). To enrich cattle gDNA from the NKC, custom oligonucleotide probes were designed and synthesized as a SeqCap EZ Developer Library (Roche Sequencing) and enrichment was performed as described in the manufacturers’ protocol. Human Cot-1 DNA was used to block repetitive regions of the cattle genome. DNA was purified at each stage of the Roche protocol using Agencourt AMPure XP DNA purification beads (Beckman Coulter, Inc.). The captured multiplex DNA sample was washed and recovered as outlined in the Roche protocol. Amplification of the enriched multiplex DNA sample libraries used LM-PCR as described in the Roche protocol. To determine how successful the enrichment was, DNA was analyzed on a DNA1000 chip using a 2100 Bioanalyzer instrument (Agilent Technologies) and using qPCR. The degree of enrichment measured across the four independent captures indicated that each library was successfully generated.
Sequencing and variant calling
The four enriched multiplex DNA sample libraries were independently sequenced using a MiSeq desktop sequencer (Illumina, Inc.) at The Pirbright Institute. The MiSeq Reagent Kit version 2 (Illumina, Inc.) was used to produce either 2 × 230- or 2 × 250-bp paired-end reads per run. Multiplexed DNA sample libraries were diluted and sequenced at a final concentration of 8 pM. Each sequencing run used a PhiX control spike which was denatured and diluted to 12.5 pM. The final pool of sequenced DNA was comprised of 99% sample library and 1% PhiX. Library preparation, sample loading, and MiSeq preparation steps were carried out as described in the manufacturers’ protocol. Resultant reads were mapped to the cattle genome build UMD_3.1 using the Burrows-Wheeler Aligner (BWA; version 0.7.5a) (Li and Durbin 2009), and variant sites were identified using SAMtools (version 0.1.18) (Li et al. 2009) and VarScan (version 2.3.6) (Koboldt et al. 2012).
Re-assembly of the cattle and goat NKC
We performed a similar analysis with the current public goat genome assembly (CHIR_1.0) and a new long-read de novo goat assembly, ARS1, that has also used PacBio sequencing to improve assembly contiguity (Bickhart et al. 2016). In the CHIR_1.0 assembly, the NKC region from KLRA to KLRE spans approximately 584 kb and contains a total of 53 sequence gaps (Fig. 1b). In contrast, a single contiguous region of approximately 600 kb was identified on a scaffold of approximately 113 mb in the ARS1 assembly. Although these regions are structurally very similar, the long-read assembly resolved the numerous gaps in CHIR_1.0 and included an extra 15 kb of sequence containing a unique NKC gene (KLRC1–2; Fig. 1b). Our findings with the cattle long-read assembly provide high confidence that the goat ARS1 assembly is accurate, and the scaffolds have been verified by both optical map and chromatin conformation analysis (Bickhart et al. 2016). Therefore, we did not repeat the BAC-based analysis that was performed for cattle. The complete cattle and goat NKC from the long-read assemblies were manually annotated at high resolution to identify all of the exons related to NKC genes and examine which had the potential to encode functional genes.
Repetitive elements within the cattle NKC
Repetitive elements are believed to play an important role in NKC evolution as potential hot spots for unequal crossover (Carlyle et al. 2008). Across the cattle NKC, interspersed elements were found to comprise approximately 51.35% of the entire nucleotide sequence, compared to approximately 44.16% reported across the whole genome (Elsik et al. 2009). The number of long-interspersed elements (LINEs) is particularly enriched, accounting for approximately 36.4% of NKC sequence compared to 23.29% for the whole genome. These values are consistent with those reported for the mouse KLRA region (Carlyle et al. 2008), suggesting that similar recombinatorial mechanisms are conserved between species across the NKC.
The unique organization and gene expansions within the ruminant NKC
Five of the six KLRC1 genes in this expanded region of cattle are >95% identical to one another, indicative of more recent evolutionary expansion. All six inhibitory KLRC1 genes and all three activating KLRC2 genes are putatively functional, based on their open reading frame sequences and the conservation of canonical splice site motifs. An unusual feature was previously reported, in which a cattle KLRC1 cDNA (NKG2A-07) possessed a cytoplasmic tail containing two immunoreceptor tyrosine-based inhibition motifs (ITIMs) and a predicted TM region containing an arginine residue creating the potential for both inhibitory and activating functions, respectively (Birch and Ellis 2007). The existence of this gene was confirmed in the ARS-UCDv0.1 genome assembly, as it matches to bota_KLRC1–4, which is located in the center of the expanded cattle KLRC region (Fig. 2).
The other cluster of KLRC genes flanked by KLRK and KLRI in rat, goat, and cattle appears more conserved across these species. Both the cattle and goat genomes contain a single inhibitory KLRC1 gene and a KLRC2 pseudogene, while cattle possess an additional likely functional KLRC2 (Fig. 2). The KLRC2 pseudogenes of cattle and goats share the same disabling features and are both missing the last two exons, indicating that functionality was lost prior to their divergence from a common ruminant ancestor. Importantly, KLRD has duplicated in both cattle and goats (Supplementary Table S2), consistent with an expanded KLRC repertoire in other species and suggesting that the heterodimeric KLRC/KLRD partnership has been preserved and subject to similar diversification pressures.
KLRH has been reactivated and expanded in cattle
The existence of KLRH-like genes in ruminants is intriguing, as KLRH has not yet been described beyond the rodent lineage. Five such genes in cattle possess activating TM and cytoplasmic domains (bota_KLRH2, bota_KLRH3, bota_KLRH4, bota_KLRH5, and bota_KLRH6), of which three are putatively functional (bota_KLRH2, bota_KLRH5, and bota_KLRH6), although one of these contains two non-canonical splice sites (bota_KLRH6). Furthermore, one of two KLRH genes in the goat has likewise aquired a KLRC-like tail (cahi_KLRH2). However, exon 2 of this tail appears to have been subsequently deleted from cahi_KLRH2. Moreover, two non-canonical splice sites in exons 3 and 4 and a frameshift in exon 5 have likely destroyed its functionality, despite maintenance of an open reading frame across exon 1 to exon 3. Two additional genes in cattle (bota_KLRH1 and bota_KLRH7) and one in goats (cahi_KLRH1) do not appear to be associated with exons encoding the N-terminal intracellular and TM domains. An 80-bp fragment most similar to a KLRA-like N-terminal region was identified approximately 4 kb upstream of cahi_KLRH1 using the NCBI conserved domain database (Marchler-Bauer et al. 2015), whereas BLASTN, BLASTX, and HMMgene failed to predict this portion of the gene. Despite this, both cahi_KLRH1 and bota_KLRH1 intriguingly form an intact open reading frame across the exons encoding the C-terminal lectin domain. Together, these findings indicate that KLRH was functionally resurrected with the acquisition of an activating tail prior to the Bovinae-Caprinae divergence and further expanded in the bovine lineage.
Receptors with activating potential have been disrupted
All four of the cattle pseudogenes and both goat pseudogenes that contain the full complement of exons (bota_KLRH3, bota_KLRH4, bota_KLRC2–1, cahi_KLRH2, cahi_KLRC2–1) encode potentially activating receptors (Fig. 2). It is interesting to note that none of the genes share the same disrupting features. Whereas bota_KLRH4 contains a single frameshift in exon 5, bota_KLRH3 contains no disabling genetic lesions within the lectin domain-encoding exons but contains a frameshift within exon 3 and non-canonical splice site in exon 2. The cattle bota_KLRH7 gene is disabled by nonsense mutations in both exons 4 and 5 and a non-canonical splice site and frameshift in exon 5, while the other KLRH-like gene lacking a tail (bota_KLRH1) possesses an intact open reading frame with canonical splice sites. We conclude from this diversity of disabling mutations that both the recombinant KLRH genes with activating KLRC2-like tails and those apparently lacking tails were independently disrupted after their expansion.
The initiation codon of bota_KLRI2 is mutated (ATG− > AAG) in both the UMD_3.1 and ARS-UCDv0.1 assemblies. Although a potential alternative start site exists several codons downstream, a non-canonical splice site at the end of exon 1 may additionally disrupt the gene. We observed an additional 2-bp frameshift that results in multiple downstream stop codons in both of the overlapping RPCI-42 BAC clones, arguing that this gene has become non-functional in the cattle genome. In contrast, no disabling mutations were observed within the genes encoding inhibitory receptors. This type of selective disruption of activating genes is also a feature of the cattle KIR (Sanderson et al. 2014), as well as human KIR and mouse KLRA (Abi-Rached and Parham 2005).
Allelic polymorphism is concentrated in the predicted extracellular and ligand-binding domains
Allelic polymorphism is a feature of expanded NK cell receptor complexes (Trowsdale et al. 2001). We used the fact that the cattle BAC clones were derived from three different individuals to examine the allelic variability between the overlapping sequences. No SNP variation was identified between the Friesian-derived TPI4222-102F7 and TPI4222-309A12 clones, suggesting that they were derived from the same haplotype, which may be a consequence of the historically or current low effective population size for this breed. More surprisingly, there was little polymorphism between these clones and the ARS-UCDv0.1 assembly, despite the latter being derived from a Hereford individual. Across 180 kb of overlap between the assembly and TPI4222-309A12, there were a total of 177 SNPs and 78 insertions/deletions (indels). Only one of these was located within an exon, a synonymous SNP within exon 2 of bota_KLRH3. Similarly, none of the 713 SNPs between TPI4222-343E2 and the genome assembly were within the coding regions of bota_KLRE or bota_KLRD2. The CHORI-240 clones, which were used as a major part of the UMD_3.1 assembly and were derived from L1 Domino 99375, the sire of the animal used for the long-read assembly, contained no SNPs relative to the ARS-UCDv0.1 assembly, perhaps because they came from the same haplotype assembled from his daughter’s genome. However, the CH240-239G9 and RP42–154D6 sequences share a 3-bp insertion within one of the lectin domain-encoding exons of bota_KLRJ. Three additional non-synonymous changes were further observed in the lectin region of bota_KLRJ on RP42–162P15, but not on RP42–154D6, relative to the genome assembly. Thus, both overlapping RPCI-42 clones derive from different haplotypes. Despite this, both of these clones possess a shared, identical copy of bota_KLRI2, indicating that these two haplotypes have either recombined or that the bota_KLRI2 paralog common to both has undergone gene conversion.
Comparison to other mammalian genome drafts confirms the plasticity of NKC genes and the unique organization in ruminants
To better understand the history of NKC evolution, we compared the cattle and goat long-read assemblies with the available reference genomes for sheep (Oar_v3.1), pig (Sscrofa10.2), and horse (Equ Cab 2) that are all based on short sequence reads. As expected based on our findings with the UMD_3.1 and CHIR_1.0 assemblies, the NKCs of these reference genomes contain numerous sequence gaps, making conclusions about detailed genomic structure and allele content provisional (Supplementary Fig. S2). However, the sheep NKC assembly is consistent with the goat and cattle, with the only substantial differences being within the highly variable region containing the expanded KLRC and KLRH, which is heavily fragmented into 17 contigs. Furthermore, while there are numerous KLRC-like fragments present on various contigs in the sheep assembly, it is impossible to determine whether any of them are associated with a KLRH-like gene, as seen in cattle and goat.
The pig is a more distantly related artiodactyl to ruminants, sharing a common ancestor approximately 60 Ma (Meredith et al. 2011). They possess a single inhibitory KLRA and KLRJ with two small KLRH-like fragments proximal to and in the same orientation as KLRA, similar to the ruminant genomes. In addition, a single KLRH gene lacking the first three exons is proximal to KLRJ in the same position and in the same orientation as KLRH1 in ruminants. Overall, the pig NKC is considerably more compact and shows little evidence of gene expansion (Supplementary Fig. S2 and Supplementary Table S2). Notably, the porcine NKC has only a single activating gene, KLRK. A single inhibitory KLRI1, a single inhibitory KLRC1, and a single gene each of KLRD and KLRE complete this region of the porcine NKC.
The horse shared a common ancestor with cattle approximately 80 Ma (Meredith et al. 2011). In the current genome assembly, we identified NKC genes spanning a large region of approximately 1260 kb, which contains 13 sequence assembly gaps. We identified five functional inhibitory KLRA genes and a single KLRA pseudogene, which is consistent with previous studies based on cDNA evidence (Takahashi et al. 2004). Two KLRH-like genes are also present in the same positions to those found in the artiodactyl genomes and likewise lack the first three exons. As in pigs, horses appear to possess a single functional gene each of KLRJ, KLRD, KLRE, inhibitory KLRI1, and activating KLRK. Intriguingly, the KLRC locus flanked by KLRI1 and KLRK appears to be substantially expanded compared to all other known genomes. Seven putatively functional inhibitory KLRC1 genes, three KLRC2 genes (of which only two have activating motifs), and three KLRC1 pseudogenes were identified in total. Thus, despite expansion, the horse KLRC gene cluster has retained only two putatively functional activating members.
The evolution of the NKC
This study examined the structure and gene content of the KLR region of the NKC in the following eight mammalian species: cattle, sheep, goats, pigs, horses, rats, lemurs, and humans. Upstream from KLRE exist closely related C-type lectin (i.e., CLEC) genes, for which the gene content is not substantially different between these species apart from rodents, which have expanded their CLEC2D and KLRB loci. However, within the KLR gene cluster, our analyses revealed extensive species-specific expansions and contractions within a generally conserved framework of genes. In particular, the KLRA locus has undergone extensive expansion and contraction during mammalian evolution, as evidenced by its highly variable gene content between species and the duplication and subsequent divergence of the closely related KLRH and KLRJ genes. All three of these genes originated prior to the divergence of the Laurasiatheria (e.g., cattle, pigs, whales, horses, and dogs) and the Euarchontoglires (e.g., humans, lemurs, rabbits, and rats) approximately 92 Ma (Meredith et al. 2011). Within the human lineage, both KLRH and KLRJ were deleted (as well as KLRI and KLRE among all primates). Although different in rodents, KLRH is structurally similar in the mouse lemur and the Laurasiatherians, indicating that its sequence inverted and the intracellular and TM domains were deleted prior to the divergence of these two major clades. KLRH in rodents, on the other hand, is encoded in the same orientation as KLRA and possesses an inhibitory intracellular tail. This suggests either a duplication and subsequent contraction or that a recombination occurred between the lectin domains of KLRH and a KLRA gene during rodent evolution. In either case, the inverted KLRH-like sequences were deleted in the rodent lineage. It seems likely that in early ruminant evolution, this inverted region became a template for non-allelic homologous recombination (NAHR), in which sequence similarity between the regions containing KLRC1 and KLRH established a conversion tract with KLRC1 in the new location. The locus was then further expanded by duplication events and additional conversions.
Phylogenetic analysis of the C-type lectin domains of the ruminant KLRC genes was unable to resolve the sequence of duplications, although it is likely to have initially occurred with an ancestor homologous to bota_KLRC1–7. Shortly after the initial expansion, one of the now duplicated KLRH genes acquired the intracellular tail from an activating KLRC2 gene. This new KLRH/KLRC2 hybrid then subsequently duplicated along with the rest of the locus, giving rise to five such hybrids in cattle (KLRH2, KLRH3, KLRH4, KLRH5, and KLRH6) and one in goats (cahi_KLRH2). Three of these in cattle (KLRH2, KLRH5, and KLRH6) are putatively functional. As the knockout mutations differ between the remaining three full-length pseudogenes, it is apparent that following the initial recombination event and subsequent expansion, these genes were independently disabled. This suggests that the KLRH/KLRC2 hybrids were functional following the recombination event and prior to their expansion. Intriguingly, the open reading frame for the C-type lectin domain of KLRH remains functionally preserved in cattle, sheep, goats, pigs, horses, and mouse lemurs, despite the lack of recognizable intracellular and TM domains. As it seems unlikely for this preservation to be coincidence, this gene may be functional, perhaps expressed as a soluble receptor.
The apparent conversion of activating receptors to pseudogenes in ruminants fits with a paradigm that activating genes are quickly expanded in response to strong selection pressure, then quickly lost once beneficial function is lost (Abi-Rached and Parham 2005). In this hypothesis, retention of activating genes which are no longer useful to host survival and reproduction may be detrimental by permitting inappropriate NK cell stimulation, cytotoxicity, and autoimmunity. On the other hand, in the horse, all four identified pseudogenes contain inhibitory motifs. However, the three equine KLRC1 subgroup pseudogenes appear to have undergone block duplication along with a functional KLRC1, thus expanding a net inhibitory receptor reservoir. The disruption of activating KLRC2-1 in both cattle and goats appears to predate species divergence, as they share the same loss of the last two lectin domain exons. In cattle, the loss of activating KLRI2 function is either relatively recent or ongoing due to the less obvious nature of the knockout mutations. That this gene is missing in pigs and horses suggests that it has either been lost or its expansion has never occurred in their ancestry. The most parsimonius explanation, however, is that they evolved once and subsequently homogenized within their respective species. Thus, as a result of gene conversion, the origin of many paired NK cell receptors is ambiguous.
Use of long-read sequencing to elucidate highly repetitive genomic regions
Much of the work described was motivated by the draft quality of the public genome assemblies for livestock species. Available methods using short-read sequencing data have difficulty forming high-quality assembly of repetitive areas of the genome such as the NKC. This difficulty is compounded when the two alleles present in the animal whose genome was sequenced may be substantially different in gene content in specific areas, since the assembly attempts to project a haploid presentation of the diploid genome. Some assemblers have tendency to collapse similar repeats on the assumption that they are allelic differences, while others expand the haploid genome to include more or all of the sequence from both alleles. In both instances, uncertainty usually results in a fractured assembly in the area of tandemly duplicated genes. This is simplified by sequencing haploid representations of the genome such as in large-insert clones in bacterial vectors. Indeed, the bovine and porcine genome assemblies that included substantial content of sequence from individual BAC clones provided significantly better (but still incomplete) representations of repetitive regions. But even with relatively short stretches of haploid DNA, short-read technologies have difficulty creating unbroken assemblies when the repeat length greatly exceeds the read length. Recently, long-read technologies have overcome many of the constraints of assembling these types of genomic regions (Berlin et al. 2015; Koren et al. 2013), and we employed these methods on some of the BAC clones used to obtain bovine genome sequence.
Our results also indicate the utility of using a probe-based capture method to enrich and sequence genomic regions to elucidate allelic variation. However, given the highly repetitive nature of the recently expanded KLRC genes in cattle, short-read paired-end (i.e. 250 × 250 bp) sequencing was insufficient for mapping and assessing the diversity across this region. Furthermore, the potential for genes containing domains that have recombined from other genes, such as the case for the cattle KLRH/KLRC2 hybrids, may complicate efforts to transcriptomically assess gene expression when using standard short-read sequencing. To resolve these problems, we recommend that such repetitive immune gene clusters be sequenced and transcriptomically analyzed using long-read technology.
Our annotations of the cattle, goat, sheep, pig, and horse NKC regions have identified a large proportion of NK cell receptor gene family members that may have been subjected to expansion and contraction due to NAHR. We described the extensive KLRC expansions in cattle and horses, the discovery of KLRE and KLRI outside of rodents, the presence of KLRJ in the described species, and the identification of novel KLRH genes bearing KLRC2-like activating tails in cattle and goats. Finally, polymorphisms across the cattle NKC, and likely other species, further expands the available NK cell receptor repertoire, particularly in the KLRC/D and KLRA systems. These results fill an important evolutionary link in our understanding of the NKC and will inform future investigations of NK cell receptor diversity, assist in identifying their potential ligands, and aid in the identification of genotypes associated with differential disease outcome.
We thank William Thompson for the excellent technical support. We also thank Prof. Elizabeth Glass (The Roslin Institute, UK) for providing the Sahiwal and Nelore DNA samples, Prof. Tomohiro Kono (Tokyo University of Agriculture, Japan) for providing the Kuchinoshima-Ushi DNA, and Prof. Stephen Hall (University of Lincoln, UK) for the useful comments and for providing peripheral blood from the culled Chillingham cattle. JCS and JAH were supported by the United Kingdom Biotechnology and Biological Sciences Research Council (BBSRC) Institute Strategic Program on Livestock Viral Diseases awarded to The Pirbright Institute. MSG was supported by BBSRC grant BB/J006211/1, “Dissecting the functional impact of natural killer cell receptor variation in cattle.” SK and AMP were supported by the Intramural Research Program of the National Human Genome Research Institute, National Institutes of Health. Sequencing of Dominette was supported by the Agricultural Research Service of the United States Department of Agriculture (USDA-ARS) U.S. Meat Animal Research Center, USDA/NRSP8 Cattle Coordinator Funds, and University of California Davis. DMB was supported in part by appropriated project 1265-31000-096-00, “Improving Genetic Predictions in Dairy Animals Using Phenotypic and Genomic Information,” of the USDA-ARS. DMB and TPLS were also supported by the Agricultural Food Research Initiative (AFRI) competitive grant number 2015-67015-22970 from the USDA National Institute of Food and Agriculture (NIFA) Animal Health Program. Mention of trade names or commercial products in this article is solely for the purpose of providing specific information and does not imply recommendation or endorsement by the US Department of Agriculture.
- Averdam A, Petersen B, Rosner C, Neff J, Roos C, Eberle M, Aujard F, Münch C, Schempp W, Carrington M, Shiina T, Inoko H, Knaust F, Coggill P, Sehra H, Beck S, Abi-Rached L, Reinhardt R, Walter L (2009) A novel system of polymorphic and diverse NK cell receptors in primates. PLoS Genet 5:e1000688CrossRefPubMedPubMedCentralGoogle Scholar
- Bickhart DM, Rosen BD, Koren S, Sayre BL, Hastie AR, Chan S, Lee J, Lam ET, Liachko I, Sullivan ST, Burton JN, Huson HJ, Kelley CM, Hutchison JL, Zhou Y, Sun J, Crisa A, Ponce de Leon FA, Schwartz JC, Hammond JA, Waldbieser GC, Schroeder SG, Liu GE, Dunham MJ, Shendure J, Sonstegard TS, Phillippy AM, Van Tassell CP, Smith TPL (2016) Single-molecule sequencing and conformational capture enable de novo mammalian reference genomes. bioRxiv. doi: 10.1101/064352 Google Scholar
- Cunningham F, Amode MR, Barrell D, Beal K, Billis K, Brent S, Carvalho-Silva D, Clapham P, Coates G, Fitzgerald S, Gil L, Giron CG, Gordon L, Hourlier T, Hunt SE, Janacek SH, Johnson N, Juettemann T, Kahari AK, Keenan S, Martin FJ, Maurel T, McLaren W, Murphy DN, Nag R, Overduin B, Parker A, Patricio M, Perry E, Pignatelli M, Riat HS, Sheppard D, Taylor K, Thormann A, Vullo A, Wilder SP, Zadissa A, Aken BL, Birney E, Harrow J, Kinsella R, Muffato M, Ruffier M, Searle SM, Spudich G, Trevanion SJ, Yates A, Zerbino DR, Flicek P (2015) Ensembl 2015. Nucleic Acids Res 43:D662–D669CrossRefPubMedGoogle Scholar
- Di Palma F (1999) Analysis and mapping of bovine MHC class I genes, Doctoral thesis. Reading, UK: University of ReadingGoogle Scholar
- Elsik CG, Tellam RL, Worley KC, Gibbs RA, Muzny DM, Weinstock GM, Adelson DL, Eichler EE, Elnitski L, Guigo R, Hamernik DL, Kappes SM, Lewin HA, Lynn DJ, Nicholas FW, Reymond A, Rijnkels M, Skow LC, Zdobnov EM, Schook L, Womack J, Alioto T, Antonarakis SE, Astashyn A, Chapple CE, Chen HC, Chrast J, Camara F, Ermolaeva O, Henrichsen CN, Hlavina W, Kapustin Y, Kiryutin B, Kitts P, Kokocinski F, Landrum M, Maglott D, Pruitt K, Sapojnikov V, Searle SM, Solovyev V, Souvorov A, Ucla C, Wyss C, Anzola JM, Gerlach D, Elhaik E, Graur D, Reese JT, Edgar RC, McEwan JC, Payne GM, Raison JM, Junier T, Kriventseva EV, Eyras E, Plass M, Donthu R, Larkin DM, Reecy J, Yang MQ, Chen L, Cheng Z, Chitko-McKown CG, Liu GE, Matukumalli LK, Song J, Zhu B, Bradley DG, Brinkman FS, Lau LP, Whiteside MD, Walker A, Wheeler TT, Casey T, German JB, Lemay DG, Maqbool NJ, Molenaar AJ, Seo S, Stothard P, Baldwin CL, Baxter R, Brinkmeyer-Langford CL, Brown WC, Childers CP, Connelley T, Ellis SA, Fritz K, Glass EJ, Herzig CT, Iivanainen A, Lahmers KK, Bennett AK, Dickens CM, Gilbert JG, Hagen DE, Salih H, Aerts J, Caetano AR et al (2009) The genome sequence of taurine cattle: a window to ruminant biology and evolution. Science 324:522–528CrossRefPubMedPubMedCentralGoogle Scholar
- Groenen MA, Archibald AL, Uenishi H, Tuggle CK, Takeuchi Y, Rothschild MF, Rogel-Gaillard C, Park C, Milan D, Megens HJ, Li S, Larkin DM, Kim H, Frantz LA, Caccamo M, Ahn H, Aken BL, Anselmo A, Anthon C, Auvil L, Badaoui B, Beattie CW, Bendixen C, Berman D, Blecha F, Blomberg J, Bolund L, Bosse M, Botti S, Bujie Z, Bystrom M, Capitanu B, Carvalho-Silva D, Chardon P, Chen C, Cheng R, Choi SH, Chow W, Clark RC, Clee C, Crooijmans RP, Dawson HD, Dehais P, De Sapio F, Dibbits B, Drou N, Du ZQ, Eversole K, Fadista J, Fairley S, Faraut T, Faulkner GJ, Fowler KE, Fredholm M, Fritz E, Gilbert JG, Giuffra E, Gorodkin J, Griffin DK, Harrow JL, Hayward A, Howe K, Hu ZL, Humphray SJ, Hunt T, Hornshoj H, Jeon JT, Jern P, Jones M, Jurka J, Kanamori H, Kapetanovic R, Kim J, Kim JH, Kim KW, Kim TH, Larson G, Lee K, Lee KT, Leggett R, Lewin HA, Li Y, Liu W, Loveland JE, Lu Y, Lunney JK, Ma J, Madsen O, Mann K, Matthews L, McLaren S, Morozumi T, Murtaugh MP, Narayan J, Nguyen DT, Ni P, Oh SJ, Onteru S, Panitz F, Park EW et al (2012) Analyses of pig genomes provide insight into porcine demography and evolution. Nature 491:393–398CrossRefPubMedPubMedCentralGoogle Scholar
- Kawahara-Miki R, Tsuda K, Shiwa Y, Arai-Kichise Y, Matsumoto T, Kanesaki Y, S-i O, Ebihara S, Yajima S, Yoshikawa H, Kono T (2011) Whole-genome resequencing shows numerous genes with nonsynonymous SNPs in the Japanese native cattle Kuchinoshima-Ushi. BMC Genomics 12:103–103CrossRefPubMedPubMedCentralGoogle Scholar
- Marchler-Bauer A, Derbyshire MK, Gonzales NR, Lu S, Chitsaz F, Geer LY, Geer RC, He J, Gwadz M, Hurwitz DI, Lanczycki CJ, Lu F, Marchler GH, Song JS, Thanki N, Wang Z, Yamashita RA, Zhang D, Zheng C, Bryant SH (2015) CDD: NCBI’s conserved domain database. Nucleic Acids Res 43:D222–D226CrossRefPubMedGoogle Scholar
- Meredith RW, Janečka JE, Gatesy J, Ryder OA, Fisher CA, Teeling EC, Goodbla A, Eizirik E, Simão TLL, Stadler T, Rabosky DL, Honeycutt RL, Flynn JJ, Ingram CM, Steiner C, Williams TL, Robinson TJ, Burk-Herrick A, Westerman M, Ayoub NA, Springer MS, Murphy WJ (2011) Impacts of the cretaceous terrestrial revolution and KPg extinction on mammal diversification. Science 334:521–524CrossRefPubMedGoogle Scholar
- Sambrook JG, Sehra H, Coggill P, Humphray S, Palmer S, Sims S, Takamatsu HH, Wileman T, Archibald AL, Beck S (2006) Identification of a single killer immunoglobulin-like receptor (KIR) gene in the porcine leukocyte receptor complex on chromosome 6q. Immunogenetics 58:481–486CrossRefPubMedGoogle Scholar
- Sanderson ND, Norman PJ, Guethlein LA, Ellis SA, Williams C, Breen M, Park SDE, Magee DA, Babrzadeh F, Warry A, Watson M, Bradley DG, MacHugh DE, Parham P, Hammond JA (2014) Definition of the cattle killer cell Ig-like receptor gene family: comparison with aurochs and human counterparts. J Immunol 193:6016–6030CrossRefPubMedPubMedCentralGoogle Scholar
- Smit A, Hubley R, Green P (2015) RepeatMasker Open-4.0. http://www.repeatmasker.org
- Wade CM, Giulotto E, Sigurdsson S, Zoli M, Gnerre S, Imsland F, Lear TL, Adelson DL, Bailey E, Bellone RR, Blocker H, Distl O, Edgar RC, Garber M, Leeb T, Mauceli E, MacLeod JN, Penedo MC, Raison JM, Sharpe T, Vogel J, Andersson L, Antczak DF, Biagi T, Binns MM, Chowdhary BP, Coleman SJ, Della Valle G, Fryc S, Guerin G, Hasegawa T, Hill EW, Jurka J, Kiialainen A, Lindgren G, Liu J, Magnani E, Mickelson JR, Murray J, Nergadze SG, Onofrio R, Pedroni S, Piras MF, Raudsepp T, Rocchi M, Roed KH, Ryder OA, Searle S, Skow L, Swinburne JE, Syvanen AC, Tozaki T, Valberg SJ, Vaudin M, White JR, Zody MC, Lander ES, Lindblad-Toh K (2009) Genome sequence, comparative analysis, and population genetics of the domestic horse. Science 326:865–867CrossRefPubMedPubMedCentralGoogle Scholar
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.