Immunogenetics

, Volume 65, Issue 1, pp 25–35

Identification of natural killer cell receptor genes in the genome of the marsupial Tasmanian devil (Sarcophilus harrisii)

Authors

  • Lauren E. van der Kraan
    • Faculty of Veterinary ScienceUniversity of Sydney
  • Emily S. W. Wong
    • Faculty of Veterinary ScienceUniversity of Sydney
  • Nathan Lo
    • School of Biological SciencesUniversity of Sydney
  • Beata Ujvari
    • Faculty of Veterinary ScienceUniversity of Sydney
    • Faculty of Veterinary ScienceUniversity of Sydney
Original Paper

DOI: 10.1007/s00251-012-0643-z

Cite this article as:
van der Kraan, L.E., Wong, E.S.W., Lo, N. et al. Immunogenetics (2013) 65: 25. doi:10.1007/s00251-012-0643-z

Abstract

Within the mammalian immune system, natural killer (NK) cells contribute to the first line of defence against infectious agents and tumours. Their activity is regulated, in part, by cell surface NK cell receptors. NK receptors can be divided into two unrelated, but functionally analogous superfamilies based on the structure of their extracellular ligand-binding domains. Receptors belonging to the C-type lectin superfamily are predominantly encoded in the natural killer complex (NKC), while receptors belonging to the immunoglobulin superfamily are predominantly encoded in the leukocyte receptor complex (LRC). Natural killer cell receptors are emerging as a rapidly evolving gene family which can display significant intra- and interspecific variation. To date, most studies have focused on eutherian mammals, with significantly less known about the evolution of these receptors in marsupials. Here, we describe the identification of 43 immunoglobulin domain-containing LRC genes in the genome of the Tasmanian devil (Sarcophilus harrisii), the largest remaining marsupial carnivore and only the second marsupial species to be studied. We also identify orthologs of NKC genes KLRK1, CD69, CLEC4E, CLEC1B, CLEC1A and an ortholog of an opossum NKC receptor. Characterisation of these regions in a second, distantly related marsupial provides new insights into the dynamic evolutionary histories of these receptors in mammals. Understanding the functional role of these genes is also important for the development of therapeutic agents against Devil Facial Tumour Disease, a contagious cancer that threatens the Tasmanian devil with extinction.

Keywords

NKCLRCNatural killer cell receptorsEvolutionMarsupialTasmanian devil

Introduction

The Tasmanian devil is the world’s largest remaining marsupial carnivore. It is currently threatened with extinction due to a contagious cancer, known as Devil Facial Tumour Disease (DFTD), which is transmitted through biting (Pearse and Swift 2006). Recent studies have shown that devils are capable of mounting competent antibody responses; however, the cancer cells are transmitted between individuals without invoking an immune response (Brown et al. 2011; Woods et al. 2007). Most recently, Brown et al (2011) showed that devils are capable of natural killer cell antibody dependent cell-mediated cytotoxicity against xenogenic cancer cells, but not against DFTD cells. Characterisation of immune gene repertoires is important particularly for the development of specific immunological reagents necessary to monitor immune responses to vaccines. Further, dasyurids (carnivorous marsupials), including the Tasmanian devil, display a higher predisposition to neoplasia compared to other mammals (Holz 2008), and it is possible that the immune system of Tasmanian devils could reflect this predisposition. Here, we focus on characterising the natural killer (NK) cell receptor genes from the recently sequenced Tasmanian devil genome (Miller et al. 2011; Murchison et al. 2012).

NK cells play a crucial role in both the innate and adaptive immune responses via a repertoire of cell surface receptors. Mammalian NK receptors are classified into two functionally analogous, but unrelated superfamilies depending on the structure of their extracellular ligand-binding domains. Gene families that encode receptors containing a C-type lectin-like domain (CTLD) belong to the C-type lectin superfamily, and are encoded in a gene complex known as the natural killer complex (NKC). In contrast, gene families that encode receptors containing extracellular immunoglobulin (Ig) domains belong to the immunoglobulin superfamily, and are encoded in a region known as the leukocyte receptor complex (LRC). These regions are located on distinct chromosomes: in humans, they are located at 12p12.3-13.1 and 19q13.4, respectively (Martin et al. 2002). A summary of key genes present in different lineages of mammals and birds is found in Table 1.
Table 1

Presence of selected NKC and LRC encoded genes in different species for comparison with the Tasmanian devil

 

Human

Mouse

Opossum

Chicken

Devil

LRC

NCR1

X

X

   

FCAR

X

X

   

GPVI

X

X

X

 

X

LAIR1

X

X

   

LAIR2

X

    

KIRs

X

    

PIRs

 

X

   

LILRs

X

    

CHIRs

   

X

 

MAIRs

  

X

 

X

Digs

    

X

NKC

CLEC1A

X

X

X

 

X

CLEC1B

X

X

X

 

X

CLEC4E

X

X

X

 

X

CD69

X

X

X

X

X

KLRK1

X

X

X

 

X

KLRG1

X

X

   

OLR1

X

X

   

KLRA

Ψ

X

   

X denotes presence of gene, Ψ denotes pseudogene

Of the 17 different groups of CTLDs defined (Cummings and McEver 2009; Zelensky and Gready 2005), proteins encoded in the NKC belong to groups II and V (Hao et al. 2006). In mammals, group II receptors are predominantly expressed on the surface of dendritic cells and macrophages, while receptors belonging to group V display a broader expression on NK cells, macrophages and dendritic cells (Zelensky and Gready 2005). The group V receptors include the killer cell lectin-like receptors (KLRs), as well as many broader C-type lectin (CLEC) genes which, unlike the former, do not appear to function as NK receptors (Hao et al 2006).

The LRC in mammals can be divided into several gene subfamilies based on organisation, structure and phylogeny (Martin et al. 2002). These include the killer cell immunoglobulin-like receptors (KIRs), leukocyte Ig-like receptors (LILRs) and leukocyte-associated Ig-like receptors (LAIRs); glycoprotein VI (GPVI); the single-copy natural cytotoxicity receptor (NCR1) and the receptor for the IgA fragment (FCAR). Adjacent to the LRC is a region known as the ‘extended LRC’, comprising several additional related loci that have evolved through multiple duplication events (Barrow and Trowsdale 2008). These molecules include the sialic acid-binding Ig-like lectins (SIGLECs), carcinoembryonic antigen-related cell adhesion molecules (CEACAMs) and the pregnancy-specific glycoprotein genes.

The composition of the NKC and the LRC can vary significantly between species due to lineage-specific expansions and contractions (Kelley et al. 2005; Yoder and Litman 2011). For example, the LRC-encoded KIR gene family has expanded in most primates, while an expanded NKC-encoded KLRA gene family serves as a functional equivalent in rodents and horses (Kelley et al. 2005; Takahashi et al. 2004; Yoder and Litman 2011). Examinations of more distantly related species have also revealed an expansion of Ig-like receptors (CHIRs) in the chicken (Laun et al. 2006; Rogers et al. 2005) and C-type lectin receptors in the platypus (Wong et al. 2009). Recent studies have also revealed the presence of novel IgSF immune-type receptors in the zebrafish and Japanese medaka that may play similar roles NK cell activity in bony fish (Desai et al. 2008; Yoder et al. 2008; Yoder 2004). While C-type lectin receptors have also been described in two cichlid species (Sato et al. 2003), it has been questioned whether these represent group V C-type lectins (Zelensky and Gready 2004). This highlights the need for functional and further characterisation studies to understand the variation present in the evolution of these receptors in different species.

NK cell receptor gene clusters have been relatively well characterised in a range of eutherian species (see review in Yoder and Litman 2011), but our understanding of these gene clusters in marsupials remains limited. Marsupials occupy an important position in the vertebrate phylogenetic tree, having last shared a common ancestor with eutherian mammals about 148 million years ago (Bininda-Emonds et al. 2007). To date, the only marsupial to have its NK receptors described is the South American grey short-tailed opossum (Monodelphis domestica) (Belov et al. 2007; Hao et al. 2006; Sanderson 2009). The opossum NKC contains eight putative genes: KLRK1, CD69, CLEC4E, CLEC1A, CLEC1B and three unidentified genes, Modo1, -2 and -3 (Sanderson 2009). Within the LRC, a potential ortholog of GPVI and three SIGLEC genes have been identified (Belov et al. 2007). Additionally, Belov et al. (2007) identified 124 KIR/LILR-like Ig domains that could not be clearly assigned to known eutherian gene families, and suggested that these may form marsupial-specific Ig-like receptors (MAIRs). Since then, a further 21 CEACAMs and at least 10 SIGLECs have been identified in the opossum (Cao et al. 2009; Kammerer and Zimmermann 2010).

The Tasmanian devil and opossum lineages diverged approximately 80 million years ago (Kirsch et al. 1997), providing a unique opportunity to compare two distantly related marsupial lineages with two of the best characterised eutherian lineages (human and mouse) which shared a common ancestor at approximately the same time [∼90 million years ago (Hedges et al. 2006)]. The characterisation of these receptor clusters in marsupials will also provide insight into the dynamic evolutionary history of these genes. This article describes the identification of NK receptor genes in the Tasmanian devil genome.

Materials and methods

Genome searches and gene prediction

Profile hidden Markov models (HMMs) were generated for NKC and LRC genes as described previously (Wong et al. 2009). Briefly, Ig domains were extracted from characterised LRC genes using SMART (http://smart.embl-heidelberg.de/), while the NKC profile HMM was constructed using the conserved initial exon from characterised C-type lectin superfamily NKC genes. Profile HMMs were used to search the six-frame translation of the Tasmanian devil genome (DEVIL7.0 assembly (GCA_000219685.1); (Murchison et al. 2012) using HMMER v3.0 (http://hmmer.janelia.org/). All HMMER hits with E values less than or equal to 10 were retained as potential NKC and LRC genes, with a high cutoff value selected to ensure that most, if not all, putative LRC and NKC genes would be retrieved (Hao et al. 2006; Wong et al. 2009).

The resulting HMMER hits were padded by 5 kb of genomic sequence prior to extraction. Gene predictions were performed using the ab initio program, GENSCAN (http://genes.mit.edu/GENSCAN.html), according to default parameters. As it is common for LRC genes to contain multiple Ig domains, Ig domains located on the same supercontig within a reasonable ‘gene-like’ distance (usually 5,000 bp) were grouped, and gene predictions performed around the grouped regions. Sequences were analysed using the Basic Local Alignment Search Tool (BLAST) (Altschul et al. 1990) to identify homology to known proteins. Gene predictions where the best hit (lowest E value) matched NKC or LRC proteins were retained as putative NKC and LRC genes. HMMER hits that did not produce any predictions or alternatively produced predictions displaying no significant similarity to known genes were not considered as potential NK receptors and removed from further analyses. Where key conserved residues appeared to be absent from multiple sequence alignments, or C-type lectin domains were incomplete, a different gene prediction software, Softberry FGENESH+ (http://linux1.softberry.com/all.htm), was used to re-predict the sequences, incorporating protein homology information (the BLAST best hit).

Functional motifs and residues such as immunotyrosine inhibitory motifs (ITIMs) and charged residues were manually identified and noted. Transmembrane (TM) domains were predicted using the TMHMM server (v2.0) (http://www.cbs.dtu.dk/services/TMHMM/), and signal peptides were predicted using signal v3.0 (http://www.cbs.dtu.dk/services/SignalP/). Amino acid identity was examined using BioEdit (Hall 1999) based on the multiple sequence alignments.

Phylogenetic analyses

Protein and domain sequences were aligned using MUSCLE (Edgar 2004) via the software package, MEGA v5.05 (Tamura et al. 2011), using default parameters. Phylogenetic trees were constructed using the neighbour-joining algorithm, with pairwise deletion of gaps under the p-distance model and evaluation through 1,000 bootstrap resamplings in MEGA v5.05. Trees were constructed around individual C-type lectin and Ig domains, as well as full peptide sequences. Individual C-type lectin domains and immunoglobulin domains were extracted from eutherian, chicken and opossum sequences using SMART.

Comparison to automated gene builds

During the course of this study, ENSMBL released an automated annotation of the Tasmanian devil genome (available from http://www.ensembl.org/Sarcophilus_harrisii/Info/Index). Searches were performed of the annotated database for comparison with genes identified through manually curated prediction.

Results and discussion

Through a combination of in silico searches, gene prediction and phylogenetic analyses, we report the first identification of six putative NKC genes and an expansion of Ig domains forming 24 LRC-like open reading frames (ORFs) plus 19 extended LRC-like genes in the genome of the Tasmanian devil. The locations of these within the genome are provided in Electronic supplementary material (ESM) Table S2.

Identification and characterisation of putative NKC genes

The Tasmanian devil NKC is found on chromosome 5 with predicted genes located across two supercontigs: supercontig79 (five genes; 598.363 kb) and supercontig238 (one gene; 154.722 kb). Predicted sequences are provided in the ESM. Orthologs of six therian genes KLRK1, CLEC4E, CD69, CLEC1A, CLEC1B and Modo1 were identified (Fig. 1). Orthology of predicted genes was determined using conservation of synteny, phylogenetics and the identification of conserved motifs and residues consistent with protein structure and function. A bootstrap cutoff of 60 was selected for categorisation as orthologous based on previous studies (Sanderson 2009; Wong et al. 2009). Separate detailed alignments for these can be found in the ESM Figs. S1–5. Five of our six NKC predictions (except CLEC2-like) were annotated by the automated gene build.
https://static-content.springer.com/image/art%3A10.1007%2Fs00251-012-0643-z/MediaObjects/251_2012_643_Fig1_HTML.gif
Fig. 1

Amino acid neighbour‐joining tree of devil C‐type lectin‐like domain sequences with eutherian, opossum, and non‐mammalian sequences. Predicted devil natural killer complex (NKC) proteins are shown in blue, with eutherian sequences in red, opossum in green, and non‐mammalian in yellow. Two opossum sequences, Modo2 and Modo3, initially displayed similarity to CLEC5A and CLEC2L, and therefore these sequences were also included in the comparisons

KLRK1 encodes the receptor NKG2D, which is expressed on both T cells and NK cells as a homodimer and plays an important role in the response to tumours, as well as tumour immune evasion (Burgess et al. 2008). CLEC4E is expressed predominantly on cells of myeloid origin (Flornes et al. 2004), and has been shown to function as an activating receptor which senses damaged cells and plays an important role in immune responses to fungi (Yamasaki et al. 2008, 2009). CD69 is constitutively expressed on activated T cells and platelets, and can be induced on NK cells, neutrophils and B cells (Ziegler et al. 1993). CD69 has been suggested to play a role in regulating immune activation and the differentiation of T lymphocytes (Martin et al. 2010). CLEC1A (also known as CLEC-1) is expressed in dendritic and endothelial cells (Sobanov et al. 2001), and is involved in the regulation of T cell activity (Thebault et al. 2009). CLEC1B (also known as CLEC-2) shows high levels of expression on platelets and megakaryocytes, and lower expression on monocytes and dendritic cells (Séverin et al. 2011), though CLEC1B mRNA has also been documented in NK cells (Colonna et al. 2000). Ligands of CLEC1B include the snake toxin, rhodocytin, and podoplanin, and the receptor itself is important in platelet aggregation (Ozaki et al. 2009; Suzuki-Inoue et al. 2006).

Devil CLEC2-like appeared to represent a putative ortholog of Modo1 in the opossum which was previously identified by Sanderson (2009) (amino acid identity of 41.4 %). Phylogenetic analyses revealed that both the devil and opossum proteins appeared to be related to the CLEC2- family (including CLEC2A, CLEC2B, CLEC2C or CD69, and CLEC2D). Orthologs of Modo2 and Modo3 were not identified in the devil genome (additional details in ESM).

Comparison of the Tasmanian devil and opossum NKC reveals a strikingly conserved organisation, with gene order and gene orientation almost completely conserved (Fig. 2). The Tasmanian devil NKC proteins predominantly displayed highest amino acid identity to orthologs in the opossum compared to eutherians. NKG2D showed the most consistent amino acid conservation between species, with sequence identity of approximately 70 % across the mouse, human, opossum and devil, consistent with previous reports of similarity between human and mouse NKG2D (Eagle and Trowsdale 2007). The characteristic WIGL motif and its variations (e.g. WTGL, WMGL, YIGL and WFGL) were identified in all predicted NKC proteins, as were conserved cysteine residues, pairs of which are believed to be vital to the formation of the CTLD fold through disulfide bridge formation (Zelensky and Gready 2003, 2005). Transmembrane domains were predicted for five of the six proteins. The lack of the sixth transmembrane domain in CLEC4E is also observed in the opossum ortholog, and could represent an artefact of sequence assembly. More group V receptors were observed in the devil NKC compared to group II receptors (CLEC4E was the only group II protein), which is to be expected as proteins functioning as NK receptors belong to group V (Hao et al. 2006).
https://static-content.springer.com/image/art%3A10.1007%2Fs00251-012-0643-z/MediaObjects/251_2012_643_Fig2_HTML.gif
Fig. 2

Organisation and transcriptional orientation of genes within the natural killer complex (NKC) of eutherians, opossum and the Tasmanian devil [modified from Hao et al. (2006) and Sanderson (2009)]. Putative NKC genes in the Tasmanian devil were assigned identity based on phylogenetics. Pseudogenes are represented in white, putative functional genes in black and orthologous genes are joined by lines. The sizes in parentheses represent the length of the NKC in different species. Note that the map is not drawn to scale. Opossum sequences, orientations and alignment were obtained from Sanderson (2009)

Identification and characterisation of putative LRC genes

The Tasmanian devil LRC is located on chromosome 3. A final list of 43 ORFs arranged across 32 supercontigs, with highest similarity to the “classical” LRC genes, was identified (Fig. 3). ORFs were denoted ‘devil Ig-like open reading frames’ (or ‘Digs’) and were numbered in order of discovery. Phylogenetic analyses revealed that devil Ig domains are more similar to LRC Ig domains than the Fc type, as in the opossum (Belov et al. 2007) (ESM Fig. S6). In addition, 57 Ig domains formed 19 ORFs displaying strongest similarity to extended LRC genes. The automated gene build also identified two putative novel LILR/ILT-like receptors. The gene build, however, did not identify most putative Dig receptors or GPVI.
https://static-content.springer.com/image/art%3A10.1007%2Fs00251-012-0643-z/MediaObjects/251_2012_643_Fig3_HTML.gif
Fig. 3

Diagram comparing the predicted Tasmanian devil LRC and extended LRC in relation to the LRC and extended LRC in humans. Each box represents one gene. In relation to the Tasmanian devil predictions, each triangle represents a gene, each vertical line represents a supercontig and the direction of the triangle indicates likely transcription orientation. Lines joining genes indicate orthology based on phylogenetics. Diagram not drawn to scale. Further information on gene location is available in the ESM Table S2 (content adapted from Kelley et al. 2005)

Phylogenetic analyses revealed low levels of similarity between human LILR and Dig15 (27.0 % coding identity to human LILRA5) and eutherian GPVI and Dig12 (31.3 % amino acid identity to mouse GPVI; 15.7 % identity to human GPVI), providing some support that they may be orthologous (ESM Figs. S7 and S8).

In humans, the LRC-encoded glycoprotein VI (GPVI) is expressed on the surface of platelets, and plays an important role in haemostasis, mediating adhesion to collagen molecules which become exposed at sites of vascular injury (Nieswandt and Watson 2003). There is also emerging evidence to suggest that platelet activation could play a role in tumour metastasis (Gay and Felding-Habermann 2011). Leukocyte Ig-like receptors (LILRs; also known as ILTs and CD85s) are family of immunomodulatory receptors that are predominantly expressed on myelomonocytic cells (e.g. monocytes, dendritic cells and macrophages), though LILRA2 and LILRB1 are also expressed on NK cells (Anderson and Allen 2009; McIntire et al. 2008; Sloane et al. 2004). LILRs are thought to play varied roles, including regulating T cell function, and roles in infectious and autoimmune disease, cancer and tissue rejection (reviewed in Anderson and Allen 2009).

There was limited overall orthology between eutherian and Tasmanian devil Ig domains and receptors, suggesting that these have diversified significantly since the separation of the different lineages (Fig. 4). Indeed, very few orthologs of eutherian receptors have been identified so far in the LRC of the opossum (Belov et al. 2007) and platypus (Wong et al. 2009). Phylogenetic analyses also revealed several devil Ig domains similar to opossum Ig domains, as well as the presence of devil-specific and opossum-specific clades.
https://static-content.springer.com/image/art%3A10.1007%2Fs00251-012-0643-z/MediaObjects/251_2012_643_Fig4_HTML.gif
Fig. 4

Amino acid neighbour-joining phylogenetic tree displaying homology between individual immunoglobulin (Ig) domains from eutherian (red), opossum (green), devil (blue) and chicken (yellow) LRC proteins. In the interest of size constraints, topology only is shown in this figure. Bootstrap values below 50 are not shown

The observation that devil LILR-like proteins typically contain 1–2 Ig domains suggests a domain structure most similar to the 1–2 domain structure of chicken Ig-like receptors (CHIRs) (Viertlboeck et al. 2005). This is interesting given that eutherian LILRs can contain two to four Ig domains (Barrow and Trowsdale 2008), and MAIRs in the opossum are suggested to comprise one to five Ig domains (Belov et al. 2007). However, it is possible that these proteins may contain additional domains that were unidentified, given the presence of unsequenced regions and short contigs that limited gene prediction.

Conserved residues associated with function were identified in the LRC ORFs. Arginines were observed in TM-like regions in Dig 3, 6, 16 and 17, suggesting that these predictions may represent receptors with activatory function. Also, the presence of a motif associated with inhibitory function in Dig5 suggests that it may possess inhibitory function.

Proteins encoded by genes located in relatively close proximity on the same supercontig tended to cluster within the same clades in the phylogenetic tree (e.g. Dig9 and 10; Dig16 and 17; Dig4 and 8 on supercontigs505, -1553, and -252, respectively; Fig. 4), as may be expected following evolution by tandem duplication.

Identification and characterisation of putative ‘extended LRC’ genes

Members of the gene families encoded in the mammalian ‘extended LRC’ were also identified. Nineteen ORFs displaying highest BLAST similarity to SIGLECs and CEACAMs were identified across 17 supercontigs on chromosome 3 (ESM Figs. S9 and S10). Six putative SIGLEC genes were identified, including orthologs of MAG and CD22 (bootstrap supports of 63 and greater), and putative orthologs of two previously identified putative opossum SIGLEC genes (Belov et al. 2007). In addition, 13 predicted CEACAM genes were identified, including a putative ortholog of human CEACAM 16. The automated gene build identified seven SIGLECs (MAG, CD22, SIGLEC1, SIGLEC15, SIGLEC16 and two unidentified SIGLECs). Direct searches did not identify any further SIGLECs or any CEACAMs.

The remaining SIGLEC and CEACAM predictions displayed limited homology to eutherian proteins, and may represent marsupial-specific and devil-specific receptors. Interestingly, the arrangement of CD33-related SIGLEC genes into two subclusters, inverted vis-à-vis each other, as observed in eutherians and the opossum (Cao et al. 2009) was not observed in the Tasmanian devil, with all SIGLEC genes tandemly arranged and appearing to form only one cluster. It is thought that a single ancestral cluster underwent inverse duplication to form two subclusters before the divergence of the eutherian and marsupial lineages (Cao et al. 2009). It is possible that the devil lineage may have lost several CD33-related SIGLEC genes since diverging from the opossum lineage.

Conserved residues and motifs were identified in the putative SIGLEC and CEACAM ORFs. Inspection of protein sequences suggests that SIGLEC I and II both contain a conserved ‘essential’ arginine residue essential for function which, when mutated, results in loss of ligand-binding ability (Varki and Crocker 2009). Single ITIMs were identified in SIGLEC IV, SIGLEC III and CEACAM III, supporting a putative inhibitory function. Similarly, two ‘essential’ arginines were identified in CD22, together with five ITIMs in the cytoplasmic region, consistent with previous studies (Blasioli et al. 1999; Walker and Smith 2008). In addition, transmembrane domains were predicted for seven extended LRC ORFs, while signal peptides were predicted for six extended LRC-like ORFs.

Accuracy of gene number estimates and gene predictions

Although 43 LRC-like and extended LRC-like ORFs were identified in the Tasmanian devil genome, it is important to note that this does not necessarily mean that the Tasmanian devil possesses 43 functional LRC and extended LRC receptor loci. A consequence of the rapid birth-and-death evolution is the presence of many pseudogenes; thus, we would expect to see this in relation to families of NK receptor genes (Nei and Rooney 2005). For instance, of the 34 Ly49 loci identified in the rat, close to half are predicted to be putative non-functional pseudogenes (Nylenna et al. 2005), while 41 of the 103 CHIR loci identified in chicken are thought to represent pseudogenes (Laun et al. 2006). Therefore, it is possible that some genes may be pseudogenes, and functional studies will shed light on this.

Evolutionary history of NKC and LRC complexes and receptors

The minimal conserved NKC observed in the opossum and devil genomes may indicate that this conserved gene content was present in the common ancestor of therian mammals, with further gene duplications occurring in eutherian lineages. NKG2D, CLEC1B, CLEC1A, CD69 and CLEC4E are found in the human, mouse, rat, cattle, dog, opossum and devil genomes. Of these, only an ortholog of CD69 has been identified in the platypus (Wong et al. 2009) and the chicken (Chiang et al. 2007). OLR1 and KLRG1 are conserved in eutherians (Hao et al. 2006), and although neither has been identified yet in marsupials, OLR1 has been identified in the platypus (Wong et al. 2009). No KLRA (or Ly49) genes were observed in devils or in the opossum (Belov et al. 2007) or platypus (Wong et al. 2009), suggesting that this eutherian-specific family emerged following the divergence of the two therian lineages. CD94/NKG2 receptors are typically present in multiples in eutherians (Yoder and Litman 2011), but only a single CD94/NKG2 have been described in the opossum (Belov et al. 2007), chicken (Chiang et al. 2007) and now the devil, suggesting that the gene expansion has occurred only recently in the eutherians.

In relation to the LRC, the observation of an expansion of Ig domains in the Tasmanian devil is consistent with the expansion of Ig domains observed in the opossum (Belov et al. 2007) and suggests that devils and opossums most likely rely predominantly on Ig-like NK receptors rather than C-type lectin NK receptors. The limited orthology between marsupial, eutherian and monotreme LRC genes (Belov et al. 2007; Wong et al. 2009) also suggests that these receptors have diversified significantly since the separation of these lineages. The observation of similarity between devil and opossum Ig domains as well as devil-specific and opossum-specific clades suggests that some Ig domains may date back to a common ancestor between Australian and American marsupials, while others are the result of lineage-specific expansions. The results suggest that devils could possess a collection of devil-specific receptors not found in eutherians or opossum. This is likely to reflect the evolution of these species in different geographic regions, particularly in the presence of divergent pathogenic pressures, which have driven the adaptation of lineage-specific expansions of these receptors.

With the NK receptor genes of only two marsupial species characterised, it is too early to suggest that the similarities between the opossum and Tasmanian devil represent marsupial trends in NK receptor repertoire. The recent sequencing of the tammar wallaby (Macropus eugenii; Family: Macropodidae) genome (Renfree et al. 2011) provides a valuable future case study of a third distinct family of marsupials and a second Australian marsupial. Comparison with the receptors present in the tammar wallaby may reveal conserved receptors that are adapted to pathogenic pressures present within the Australian setting and provide insights into differences between immune systems of carrion-eating marsupial carnivores and marsupial herbivores.

Comparison to automated gene builds

During the course of this study, ENSMBL released an automated annotation of the Tasmanian devil genome (available from http://www.ensembl.org/Sarcophilus_harrisii/Info/Index). Five of our six NKC predictions (except CLEC2-like) were annotated. The automated gene build also identified two putative novel LILR/ILT-like receptors and seven SIGLECs (MAG, CD22, SIGLEC1, SIGLEC15, SIGLEC16 and two unidentified SIGLECs). The gene build, however, did not identify most putative Dig receptors, or GPVI or any CEACAMs. These results are consistent with our previous work which suggests that many highly divergent immune molecules are not identified by automated gene builds and require manual annotation (Wong et al. 2011).

Conclusion

With the characterisation of the Tasmanian devil NKC and LRC, we are beginning to understand about the evolution of these receptors in marsupials. The most striking feature is the similarity to the opossum NKC and the presence of Tasmanian devil-specific receptors. In summary, we have shown that the Tasmanian devil genome contains six NKC receptors which are orthologous to those in the opossum, as well up to 24 classical and 19 extended LRC open reading frames. We present a first glimpse into the likely repertoire of the Tasmanian devil, only the second marsupial to be studied, thus generating a more detailed picture of the potential diversity of receptors present in marsupials. Future studies will begin to unravel the role of these receptors in marsupial immunity, particularly in relation to the Tasmanian devil immune system. Indeed, it is now possible to more closely study these genes, particularly in relation to DFTD.

Acknowledgments

This work was supported by the Australian Research Council. KB is an ARC Future Fellow. We thank Tony Papenfuss from the Walter and Eliza Hall Institute for access to genomic resources.

Supplementary material

251_2012_643_MOESM1_ESM.docx (1.2 mb)
ESM 1(DOCX 1.18 mb)

Copyright information

© Springer-Verlag 2012