Allele Diversity of the Major Histocompatibility Complex in the Common Hamster (Cricetus cricetus) in Urban and Rural Populations

Based on the results of targeted sequencing on the Illumina platform we compared the allelic diversity of exon 2 of the DRB gene in urban (city of Simferopol) and rural populations of the common hamster. The urban population significantly differs from the rural one in terms of the composition and diversity of gene alleles. For individuals living in the city, a larger number of alleles, higher values of haplotype and nucleotide diversity indices, and a smaller proportion of individuals with a homozygous genotype were noted. Both populations are characterized by a significant excess of non-synonymous substitutions over synonymous ones and almost every allele of a gene corresponds to a different amino acid sequence. However, the influence of positive selection on the diversity of variants of antigen-binding sites in the alleles of the DRB gene in urban conditions is much more pronounced. The data suggest that resistance of common hamsters living in specific and varied conditions of the urban environment to various kinds of pathogenic load is higher than in rural populations.


INTRODUCTION
The rapid spread of urban ecosystems worldwide can be considered a qualitatively new stage in the development of life on Earth. By 2030, almost 10% of the land will be urbanized (Schilthuizen, 2018). An increase in the area of territories occupied by an urbanized landscape inevitably leads to the inclusion of new species in urban ecosystems that previously did not show a tendency to synanthropization. Although urban conditions are absolutely unacceptable for many species, others can populate some urban biotopes that meet their biological needs. At the same time, urban conditions turn out to be favorable for some species, and the density of their populations in the cities can be even higher than in natural biotopes. Among species, whose numbers tend to decline in natural biotopes, but who thrive in the cities are peregrine falcon (Falco peregrinus), which has populated the cities of North America and Europe in the last 30 years (Sorokin, 2002), and Cooper's hawk (Accipiter cooperii), inhabiting some cities in the United States (Boggie and Mannan, 2014;Morinha et al., 2016). Thus, the cities have actually become a place of conservation of these rare species.
Identifying favourable and unfavourable factors, which make urban environment suitable for habitation of animals, plants, and the human himself, is, undoubtedly, a most relevant problem of ecology. One of the most important directions in the study of synurbanization processes is investigation of the mechanisms of animal adaptation to new conditions, as well as defining parameters for assessing their well-being. The urban environment has many features that significantly distinguish it from natural biotopes. These include both abiotic (temperature, humidity, heavy metal pollution, light and noise pollution) and biotic factors (invasive species forming new, previously nonexistent communities, increased population density, uncharacteristic food supply, etc.) (Luniak, 2004). As a result, the city should expect appearance of new parasitic-host relations, and a higher parasitic load. It is known that, in general, urban populations of mammals and birds are more susceptible to parasite infestation and encounter a greater number of pathogens than rural ones (Gliwicz et al., 1980;Luniak, 2004). At the same time, the species encounters pathogens and parasites in the city that are not typical for natural biotopes. The question arises as to how the immune system of animal sinurbists responds to these urban "challenges." Evaluation of the allelic diversity of major histocompatibility complex (MHC) genes can serve as an indicator of the degree of adaptability of a GENETICS species to counteract the negative factors of the urban environment. These genes play a key role in organism's immune defenses, including a response to pathogen load (Klein, 1986;Hill et al., 1991;Potts and Wakeland, 1993;Brown and Eklund, 1994;Hedrick, 1994;Edwards and Potts, 1996;Janeway et al., 2004;Acevedo-Whitehouse and Cunningham, 2006;Ujvari and Belov, 2011).
The study of populations of white-footed hamsters (Peromyscus leucopus) living in New York parks showed that the genes responsible for the immune characteristics of individuals (as well as a number of others, for example, those responsible for the processing of foods rich in fats) in urban environments are under a pronounced influence of positive selection (Harris et al., 2013(Harris et al., , 2016Harris and Munshi-South, 2017). Studies of red lynxes (Lynx rufus) in Thousand Oaks (USA), faced with an epidemic of scabies, showed that before and after the epidemic, different sets of alleles of the MHC and TLR genes (Toll-Like Receptors-receptors, 4 allowing to recognize conservative structures of microorganisms and activating the cellular immune response). Apparently, only those animals that had a suitable combination of immune genes survived the epidemic (Serieys et al., 2015).
One of the species of mammals actively populating urban areas is the common hamster (Cricetus cricetus). In natural biotopes, the abundance of this species has declined catastrophically over the past half century, especially in the western part of the range (Surov et al., 2016b), to the point that in 2020 the species was included in the IUCN Red List with CR (critically endangered) status) (Banaszek et al., 2020). At the same time, colonies of the common hamster in the cities often reach high abundance (Feoktistova et al., 2013(Feoktistova et al., , 2017Surov et al., 2016aSurov et al., , 2016b. To date, populations of this species live in Nalchik, Grozny, Kislovodsk, Vladimir, Omsk, Tula, Ryazan, Moscow and some other cities of Russia (Feoktistova et al., 2019), known from Nur-Sultan in Kazakhstan , as well as from a number of European cities (Niethammer, 1982;Thorns, 1998;Endres and Weber, 1999;Kupfernagel, 2003;Losík et al., 2007;Schmelzer and Millesi, 2008;Banaszek and Ziomek, 2010;Čanády, 2013;Feoktistova et al., 2013;Matysek et al., 2013;Petrová et al., 2018;Buczek, 2019).
The object of our study was the population of the common hamster inhabiting the city of Simferopol (Crimea), the largest known urban population of this species (Tovpinets et al., 2006;Feoktistova et al., 2016). We determined the allelic diversity of exon 2 of the DRB gene, which is a part of MHC class II in mammals. The obtained values were compared with those noted for hamster colonies located in the rural anthropogenic landscape of the Crimea. The purpose of the work was to assess the impact of living in the urban community on the features of the immune system of this species. A significant difficulty in studying allelic diversity of the MHC genes is the high number of nucleotide substitutions that distinguish different alleles, as well as, in some cases, high number of copies. In order to determine the substitutions that distinguish the sequence of each of the alleles present in the genotype of an animal, laborious methods such as cloning or analysis of single-strand conformational polymorphism (SSCP) followed by Sanger sequencing in many replicates are traditionally used. In our work, we used next generation sequencing (NGS) technology, which is currently successfully replacing traditional methods (Shiina et al., 2015). Ability to read up to 250 bp, while the length of the region of interest is 246 bp, made it possible to directly determine each variant of the sequence without need to combine individual fragments of the molecule.

MATERIALS AND METHODS
To conduct molecular genetic analysis, we used tissue samples of 20 individuals of the common hamster, which we had previously caught in the city of Simferopol, and 20 individuals from a suburban population (further 20 km from the city borders), mainly on the outskirts of villages. For DNA extraction were used fresh or fixed in 96% ethanol tissues of the ear or finger (captured live animals) or similarly fixed muscle tissue from dead animals obtained from the owners of agricultural plots.
Total DNA was isolated using the Diatom™ DNA Prep kit (Isogen Lab, Moscow, Russia) according to the manufacturer's instructions. The resulting DNA solutions were stored at -18°С.
To amplify exon 2 of the DRB gene, the primers proposed by Smulders et al., 2003 were used: Forward: 5-GAGTGTCATTTCTACAACGGGA-3; Reverse: 5-CTCTCCGCGGCACAAAGGAA-3. The specificity of the primers was preliminarily verified by PCR, followed by sequencing of the resulting product by the Sanger method. The resulting sequences corresponded to those presented in (Smulders et al., 2003), but contained a significant number of unrecognized (double) positions.
Determination of the nucleotide sequences of individual alleles was carried out in "JSC Evrogen Lab." based on targeted sequencing of PCR products obtained using the primers above on the Illumina platform.
Preparation of individual libraries was carried out in accordance with the protocol described in the manual "16S Metagenomic Sequencing Library Preparation" (Part # 15044223 Rev. B; Illumina) with an increased number of amplification cycles in the first stage of PCR (35 cycles). After obtaining the amplicons, the libraries were purified and mixed equimolarly using the SequalPrep™ Normalization Plate Kit (ThermoFisher, Cat # A10510-01). The quality control of the obtained pools of libraries was carried out using the Fragment Analyzer system, quantitative analysis was carried out using qPCR.
The library pool was sequenced on Illumina MiSeq using MiSeq Reagent Kit v2 Nano: 250 bp reads on both sides of the fragments, 500 cycles). The PhiX phage library was used to control sequencing parameters.
FASTQ files were generated using bcl2fastq v2.17.1.14 Conversion Software (Illumina). Forward and backward reads during the processing of files in the FASTQ format with paired reads for each sample at the first stage of analysis were merged using the Bbmerge tool from the BBTools v38.87 package (Bushnell et al., 2017). At the same time, reads that did not merge unequivocally without replacements were discarded. The fused reads were aligned to the reference of one of the exons 2 DRB sequence variants known from the common hamster (Crcr-DRB1*14 allele, GenBank AJ490324, Smulders et al., 2003) using the Bowtie2 software (Langmead and Salzberg, 2012) with the local option. Reads with a length of at least 240 bp were extracted from the obtained alignments with a reference sequence length of 246 bp (including primer landing sites). For further analysis, only those samples were used for which at least 50 sequences were obtained that met this condition.
In the set of sequences obtained for each sample, unique variants (alleles) were identified and their frequencies were calculated. For further analysis variants were selected: (a) represented in each case by at least 10 replicates; (b) representing at least 5% of the total number of sequences obtained for that individual, and (c) represented by a replicate count of at least 25% of the replicate number of the higher frequency allele in the sample.
From the sequences selected for further analysis that met these conditions, the sites corresponding to the landing regions of the forward (22 bp) and reverse (20 bp) primers, as well as two initial positions were excluded so that the first position of the sequence matched the first position of the codon. The identity of allele sequences marked for different individuals was determined using the FaBox 1.61 online service (Villesen, 2007).
Sequence variants selected for subsequent analysis, as well as their corresponding amino acid sequences, were checked for compliance with the expected genome site using the BLAST function on the NCBI website (https://blast.ncbi.nlm.nih.gov).
Considering that the differences in the number of copies of different alleles obtained for each sample are to a certain extent influenced by random deviations during sequencing, as well as the lack of data on the possible number of copies of a given locus (gene multicopy) in the common hamster, the assessment of individual genotypes by the ratio of allele frequencies in the resulting files were not carried out. The exception was the determination of homozygous genotypes, which were registered for individuals in which only one variant was noted in the resulting set of sequences that met the above conditions. The subsequent comparison of the two analyzed samples (urban and suburban populations) was carried out on the basis of the set of alleles noted in each sample, indicating the number of animals in whose genotype this allele was noted.
These samples were compared using the Arlequin v.3.11 program (Excoffier et al., 2005) using the F st and Φ st criteria and haplotype and nucleotide diversity indices were estimated for each sample.
The median network of MHC class II nucleotide sequence variants (DRB) was constructed using the Median Joining method using the Network 4.612 program (Bandelt et al., 1999). The resulting variants of nucleotide sequences were translated, the number of variants of amino acid sequences and their correspondence to the alleles of the gene were determined for each sample, as well as for the entire studied population.
Sites potentially responsible for antigen binding were identified in the amino acid sequences based on data reported by (Brown et al., 1993) and similarly to how it was done earlier for representatives of the genus Peromyscus (Richmond and Davey, 2003).
The nucleotide sequences in the obtained samples was analyzed for the number and ratio of non-synonymous and synonymous substitutions using the MEGA X program (Kumar et al., 2018). Additionally, in this program a Z-test was performed to assess the effect of positive selection on increasing the diversity of amino acid sequences. The test was carried out on the basis of a modified Nei-Gojobori method with a transition-to-transversion frequency ratio of 2.0 and confirmed by a bootstrap test that included 10000 replicas. The analysis was performed both for the complete sequences we obtained (201 bp, 67 codons) and only for the regions responsible for antigen binding (57 bp, 19 codons).

RESULTS
The resulting files containing from 53 to 2996 sequences of the required length were obtained for 17 individuals caught in the city and 19 from rural populations. For each sample, from 15 to 606 variants of nucleotide sequences were obtained and from 1 to 4 of them met the conditions listed above. After the removal of the primer sites, all sequences recognized as valid were 201 bp long and did not contain deletions or insertions that could shift the reading frame, as well as stop codons. In total, 25 variants (alleles) were noted in the two studied samples, the sequences of which were deposited in the GenBank (Table 1). During the check using the BLAST, all of these variants were found to correspond to the MHC class II DRB1 gene in rodents. For 21 variants, the maximum similarity (91.1-99.4%) was noted with 6 sequences known for Cricetus cricetus (AJ490311, AJ490314, AJ490316-317, AJ490319, AJ490320, AJ490322-323), for 3 variants (Crcr-DRB*25, Crcr-DRB*26 and Crcr-DRB*29 from the urban population) with the sequence known for Apodemus flavicollis (JQ858341, similarity 93.3-93.8%), for the Crcr-DRB*27 allele (urban population) with the Rattus norvegicus sequence (AY626204, similarity 93.8%). The use of the BLAST function for the amino acid sequences corresponding to these alleles also showed their correspondence to the beta chain of MHC class II antigens in rodents.
Significant differences were found between the two studied samples both in the frequency of alleles (F st = 0.0316, p = 0.022) and in the pairwise differences in nucleotide sequences (Φ st = 0.09027, p = 0.0001).
The number of alleles noted in the genotype of one animal varied from 1 (homozygosity, noted in most individuals) to 4. Characteristics of sample diversity are presented in Table 2. The values of all diversity indices in the urban population were noticeably higher than in the rural ones.
Presence of different alleles in the samples is shown in Fig. 1. In the suburban population, frequency of one of the alleles (Crcr-DRB*15, found in 47.4% of individuals) significantly exceeded that of the others, which were found in no more than 15.8% (alleles Crcr-DRB*16, Crcr-DRB*17 and Crcr-DRB*19, the rest with an even lower frequency) individuals each. In the urban population, no such dominance of any allele was observed. The maximum frequency (23.5% of individuals) here was noted in the Crcr-DRB*19 allele, and three more alleles (Crcr-DRB*15, Crcr-DRB*25, and Crcr DRB*26) were noted in 17.6% of individuals each. On average, each allele was only observed in 7.5% of individuals in the urban population and in 5.7% of individuals in the rural population.
The majority of alleles (73.7% in urban and 54.5% in rural areas) were unique for each population. Of the 25 alleles noted in our study, only 5 (20%) were common for both samples, and the rest were found only in one population each (Figs. 1, 2). Two of them (Crcr-DRB*15 and Crcr-DRB*19) were relatively common in both populations, whereas three were reported either for both or for one population in only 1-2 individuals each. The median-joining network of allele sequences in urban and rural populations is shown in Fig. 3. In the center of the network, two of the most frequent alleles common to both samples are located. Two other common alleles are also located closer to the center. One of the two branches is represented by alleles found either in the urban or rural population, the other is represented by unique variants found only in the urban population. However, the identification of specific phylogenetic relationships between individual variants is impossible in most cases due to a significant number of unresolvable nodes.
Most of the nucleotide sequence variants (23 out of 25) corresponded to the original protein variants (Fig. 4). The exceptions were two alleles: Crcr-DRB*25, which is present only in the urban population, encodes the same amino acid as the Crcr-DRB*15 allele, which occurs in both populations. The Crcr-DRB*18 allele, unique to the rural population, encodes the same amino acid as Crcr-DRB*22, which is also characteristic of both populations. It should be noted that the Crcr-DRB*25 and Crcr-DRB*18 alleles were observed in only one animal each.
The number of nonsynonymous substitutions in the allele sequences significantly exceeded the number of synonymous ones both in each of the studied samples and in total (Table 3). At the same time, the value of dN/dS in the sample from the urban population was higher than in the rural population, both when comparing the full-length sequences and only antigenbinding regions.
Within the amino acid sequences encoded by the sequenced regions of exon 2 of the DRB gene, 19 of the 67 amino acid residues were presumably responsible for antigen binding (Fig. 4).
Significant compliance with the hypothesis of the impact of balancing selection was found only for antigen-binding regions in the generalized sample and the sample from the urban, but not the rural population (Table 3).

DISCUSSION
Genetic diversity is one of the most important factor in the survival of populations. At the same time, its study should combine investigation not only of neutral genetic markers (including microsatellite loci), but also genes responsible for adaptive characteristics of the organism, in particular MHC (Pfrender et al., 2000;Reed and Frankham, 2001;Hedrick, 2001;McKay and Latta, 2002;Luikart et al., 2003;Palo et al., 2003;Gomez-Mestre and Tejedo, 2004;Ujvari et al., 2005;Ujvari and Belov, 2011;Shiina et al., 2015). The latter are responsible for functioning of    Table 1). immune system under infectious and parasitic load, and presence of polymorphism of these genes may indicate healthiness of populations (Biedrzycka et al., 2011). The diversity of MHC genes is an important factor in predicting endangered animal populations in the wild Belov, 2011, Shiina et al., 2015).
An increase in the polymorphism of MHC alleles in an individual genotype is facilitated by the multicopy nature of the genes of this complex, i.e., presence of more than one locus of a certain type. This phenomenon is widespread in mammals: horses (Fraser and Bailey, 1996), primates (Bontrop et al., 1999;Khazand et al., 1999), cattle (Lewin et al., 1999), various feline species (Kennedy et al., 2002), sea lions (Bowen et al., 2004). MHC genes are also multicopy in rodents, in particular, in gerbils (Gerbillurus paeba) (Harf and Sommer, 2005) and beavers (Castor fiber pohlei) (Babik et al., 2005). The question of the DRB gene multicopy in the common hamster has remained open to date. In the study of European populations, none of the 70 samples was found to have more than two alleles (Smulders et al., 2003). However, this study was conducted using cloning method with amplification and sequencing of no more than 10 molecules (clones) per sample. In our case, sequencing a much higher number of amplicons demonstrated possibility of presence in the genotype of the common hamster of up to 4 different sequences of exon 2 of the DRB gene, found in roughly equal proportions.
Living in urban environments is often characterized by depletion of genetic diversity identified by neutral markers (Kajdacsi et al., 2013;Chiappero et al., 2011;MunshiSouth et al., 2014;Feoktistova et al., 2016Feoktistova et al., , 2019. It is related to both the limited number of founder individuals that have adapted to urban conditions, and the limited and isolated habitable territories, what contributes to increased inbreeding. However, in urban conditions animals become targets for a large number of non-standard pathogens and parasites due to high density of synanthropic species and various stress factors (Gliwicz et al., 1980;Luniak, 2004). Therefore, accumulation of a higher diversity of genes responsible for immune characteristics is extremely important for survival of synurbist species.
As we mentioned in the introduction, the common hamster is a species included in the IUCN Red List due to a sharp decline in the number of natural populations. Studies based on neutral markers (both mtDNA and microsatellite loci) showed a dramatic decrease in the genetic diversity of its populations in the extreme western part of the range, while in populations from Eastern Europe, diversity for these markers remains quite high (Smulders et al., 2003;Neumann et al., 2004Neumann et al., , 2005Banaszek et al., 2011;Reiners et al., 2014).
The only study evaluating the allelic diversity of MHC genes in common hamster populations was also carried out for natural populations in the western part of the species range (Smulders et al., 2003). A sharp decline in diversity, with reduction of the number of alleles up to a single allele of exon 2 of the DRB gene, was revealed in the populations of France and Holland, while the analysis of museum samples showed that as early as the beginning of the last century, at least 7 more alleles occurred in these populations. A sharp decrease in DRB allelic diversity in the extreme western part of the species range is associated with the reduction in average body weight, deterioration in reproduction, and in resistance to parasitic loads. At the same time, in the modern population from the Czech Republic (Eastern Europe), 15 animals had 13 different alleles of the DRB gene, and most of the animals were heterozygous (Smulders et al., 2003). Given that the common hamster had been actively populating cities in the last 50 years, the question arose: how healthy are the synurbic populations of this species? Previously, we studied features of distribution of mitochondrial lineages in populations of the common hamster in a number of cities in the Caucasus and Crimea (Feoktistova et al., 2016(Feoktistova et al., , 2019 and showed that diversity of these lineages is lower in cities. However, unique haplotypes, unknown outside the city, were also found in urban populations. Thus, in the territory of Simferopol, only three of the seven haplotypes of the combined sequence of the cytochrome b gene and the control region, present on the Crimean Peninsula, were found. Furthermore, two of them were unique for this city. Our analysis of the allelic composition of the DRB gene showed that, on the contrary, it is the urban population that is characterized by an increased diversity of this gene in all indicators (the number of alleles reported for the population, an average number of alleles in the genotype of an individual, and indexes of haplotype and nucleotide diversity calculated for the population). The number of alleles (11) in rural populations of the common hamster on the Crimean Peninsula was somewhat lower than for the prosperous non-urban population of the Czech Republic (13), but for Simferopol this value (19) was significantly higher. Although the number of non-synonymous substitutions exceeded the number of synonymous ones in both samples from the Crimean Peninsula, the ratio of dN/dS and the effect of the positive selection in the urban population were markedly higher than in the rural one.
The obtained results allow us to assume that the representatives of the urban population of Simferopol are well "prepared" to withstand the "urban challenges," having greater resistance to infections and parasitic load. Most likely, this is a consequence of selection in the specific conditions of the urban environment.