The major histocompatibility complex (MHC) is one of the most immunologically significant and genetically diverse regions of the vertebrate genome (Parham and Ohta 1996; Trowsdale 2011). An understanding of the nature, function and maintenance of allelic diversity at MHC loci in the major livestock species is increasingly important for the development of new and improved vaccines for global food security. The comparative MHC Nomenclature Committee provides guidance in the development of standardised nomenclature systems for alleles at MHC loci in non-human species (Ballingall et al. 2018a). The Ovine Nomenclature Committee has facilitated the development of a nomenclature system for alleles at the highly polymorphic MHC class II DRB1 locus (Ballingall et al. 2011) with over 100 alleles having received official names. An associated database of these sequences is maintained in the ovine section of the IPD-MHC Database (https://www.ebi.ac.uk/ipd/mhc/). However, progress in developing an official nomenclature system for the DQ loci in sheep has been slower due to complexities associated with gene duplication, haplotype variation and limited sequence availability. As such, no officially named alleles are currently available for the ovine DQ genes which creates ambiguity within the research community when describing alleles at each of these loci.

The DQ repertoire in sheep is complicated by a functional duplication of the DQA and DQB genes (Scott et al. 1987, 1991a, b). Haplotypes analysed to date have two DQA genes with the majority including DQA1 and DQA2 loci (Scott et al. 1991a; Hickford et al. 2007). However, in a number of haplotypes, the DQA1 gene appears absent (DQA1 null, Scott et al. 1991a, Fabb et al. 1993). In these haplotypes, the DQA2 locus is found in combination with a second locus which phylogenetic analyses of the second exon suggests is more closely related to DQA2, (Hickford et al. 2004) than to DQA1. Hence, the name is DQA2-like. As in other mammals, genomic analysis has indicated that the DQ genes in sheep occur as closely linked A/B gene pairs with DQA1 linked to DQB1 and DQA2 linked with DQB2 (Wright and Ballingall 1994; Herrmann-Hoesing et al. 2008). No similar genomic analysis of haplotypes including DQA2-like sequences has been reported; however, transcriptional analysis of MHC homozygous animals identified unique transcripts which co-express with DQA2-like genes. It is likely that these represent the partner of DQA2-like and have therefore been termed DQB2-like (Ballingall et al. 2018b).

The ISAG/IUIS-VIC OLA Nomenclature Committee now considers that sufficient sequence data is available to propose a nomenclature framework for alleles at each of the DQ genes in sheep. We propose a nomenclature system for alleles at the Ovar-DQA1, DQA2, DQA2-like, DQB1, DQB2 and DQB2-like genes. This will be developed using 82 full-length and partial DQA and DQB transcripts and will take an approach similar to that used to assign cattle MHC class I sequences to individual loci (Hammond et al. 2012).

The majority of the full-length DQA and DQB transcripts used to develop the nomenclature have previously been described (Ballingall et al. 2015, 2018b). Additional full-length transcripts are derived from MHC haplotype analysis within the Soay (Kara Dicks, personal communication) and Argos breeds (Panoraia Kyriazopoulou, personal communication). An additional 31 previously unpublished DQB transcripts covering 575 bp between exon 2 and exon 4 were included in the development of the DQB nomenclature. These sequences were derived from cloned PCR products, amplified from cDNA prepared from RNA extracted from PBMC from a Rambouillet flock using the primers and amplification conditions described by Herrmann-Hoesing et al. (2008). Each DQ nucleotide sequence has been submitted to the IPD-MHC Database (https://www.ebi.ac.uk/ipd/mhc) to allow construction of the allelic databases. The origin and identity of each DQA and DQB sequence is described in Tables 1 and 2 respectively. The recently updated tools available on the IPD-MHC Database (Maccari et al. 2017) allow alignment of nucleic acid or amino acid sequences from individual or multiple loci within and across species. Alignments of all DQA and DQB nucleotide sequences are provided in supplementary Figures 1 and 2 respectively or may be generated using the tools available on the IPD-MHC Database.

Table 1 List of DQA sequences with the proposed allelic nomenclature
Table 2 List of DQB sequences with the proposed allelic nomenclature

In our proposed nomenclature, gene specificity is based on nucleotide similarity to full-length reference sequences derived from MHC homozygous animals which in turn are based on sequence similarity to the initial descriptions of linked DQA1/ B1 and DQA2/ B2 genes on cosmid and BAC clones (Wright and Ballingall 1994; Herrmann-Hoesing et al. 2008). Sequence similarity was determined by a BLAST search of an in-house sequence database. Specificity was confirmed by constructing maximum likelihood phylogenetic trees in IQ-TREE (Trifinopoulos et al. 2016) using multiple alignments generated using CLUSTAL Omega (Sievers et al. 2011). The model selection tool (Kalyaanamoorthy et al. 2017) within IQ-tree was used to select the optimum substitution models, prior to phylogenetic tree estimation. The optimum substitution models selected for the DQA and DQB sequences were the Kimura 2 parameter (K2P, Kimura 1980) +R3 and K2P +G4 respectively. Tree topology was tested with 10,000 bootstrap replicates using the ultrafast boost strap method of Minh et al. (2013). The DQA and DQB trees are shown in Figs. 1 and 2 respectively. The clustering of sequences in Figs. 1 and 2 is consistent with alleles at each of the DQ genes in sheep.

Fig. 1
figure 1

Maximum likelihood tree estimating the relationships between ovine DQA sequences

Fig. 2
figure 2

Maximum likelihood tree estimating the relationships between ovine DQB sequences

Following sequence comparison and phylogenetic analysis, the nomenclature system was developed for each cluster of sequences, in accordance with the guidance provided in the recent report from the Comparative MHC Nomenclature Committee (Ballingall et al. 2018a). A full-length transcript was selected as the reference sequences for each cluster. For consistency, a single well-defined haplotype, 501b, was selected to provide the reference sequences for DQA1, DQA2 and DQB1, DQB2 genes (Ballingall et al. 2018b). The 501a haplotype was selected to provide the reference sequences for DQA2-like and DQB2-like (Ballingall et al. 2018b).

To maintain consistency with HLA nomenclature (Marsh et al. 2010), and to follow the guidelines of the MHC Nomenclature Committee (Maccari et al. 2018), the following system is proposed for alleles at each locus. The first number or field following the species and locus designation (Ovar-DQA1, Ovar-DQB1) and separated by an asterisk (*) represents the allelic group (Ovar-DQA1*01, *02 etc.) where alleles within a group differ by no more than four amino acids within the alpha 1 or beta 1 domain (encoded by the second exon) and no more than four amino acids predicted throughout the remainder of the transcript. The next field separated by a colon (:) as a field separator indicates coding (non-synonymous) change within the allelic group (Ovar-DQA1*01:01, Ovar-DQB1*01:01) and is manually assigned by the ovine group curator in order of submission to the IPD-MHC Database. The following field, again separated by a colon (Ovar-DQA1*01:01:02, Ovar-DQB1*01:01:02), may be used to indicate silent or synonymous substitutions. The flexibility of the system allows for additional fields to reflect diversity within intronic and regulatory regions.

Requests for official names for DQ sequences will be delivered through the IPD-MHC Database provided that sequence quality guidelines are followed (Ballingall et al. 2018a). Sequences should be submitted to the IPD-MHC Database using the online tools. Ideally, full-length transcripts should be submitted as these simplify gene assignment. The immediate three-prime untranslated regions which appear to contain gene specific motifs (Ballingall et al. 2015, 2018b) are especially helpful in gene assignment (see Supplementary Figures 1 and 2). Gene assignment and the subsequent allelic nomenclature will be based on the closest match to other DQ sequences held in the database. In cases where it is not clear from which a gene sequence is derived, the curator may request additional data.

A number of additional gene duplications within the ovine DQ region have been recently been reported (Ali et al. 2017, Ballingall et al. 2018b, Kara Dicks, personal communication). It is not yet clear if these represent functional duplications or gene fragments that co-amplify with the primers used. For the purpose of nomenclature, such sequences will be named depending on their clustering with other DQ sequences.

In summary, here we provide nomenclature for 82 alleles at the duplicated DQ genes within the MHC class II region of sheep. Associated databases of alleles are available within the IPD-MHC Database for alignment and download. Such a resource will support research in ruminant immunology, vaccine development, comparative studies of MHC evolution and population-based analyses of MHC diversity and disease. This is the first allelic nomenclature system proposed for the DQ genes in a ruminant species which assigns sequences to individual loci based on sequence and phylogenetic analysis. It may also provide a framework for the DQ genes in other farmed ruminant species including cattle and goat.