Allelic Polymorphism of Anthrax Pathogenicity Factor Genes as a Means of Estimating Microbiological Risks Associated with Climate Change

Climate change brings new risks of emergence of especially dangerous diseases. The paper reports the possibility of assessing the pathogenic potential of bacteria as demonstrated by studying the allelic polymorphism of anthrax bacterium pathogenicity factor genes, which is a prerequisite for assessing the associated microbiological risks. The allelic polymorphism of the capBCADE operon (capB, capC, capA, capD, and capE genes) encoding the capsule biosynthesis proteins of Bacillus anthracis, and the acpA and acpB genes encoding the expression regulators of this operon have been studied for the first time. A number of single nucleotide polymorphisms (SNPs) were described in the strains of the studied sample, including 5 SNPs in the capB gene, 3 in capC, 4 in capA, 14 in capD, 2 in capE, and 15 in acpB, as well as 7 SNPs and one insertion in the acpA gene. As a result, the sample has been divided into sequence types for each gene and 17 genotypes, which are combinations of the identified sequence types. In silico translation of the detected alleles of the studied genes revealed three isoforms of the CapB and CapA proteins, two isoforms of the CapC and CapE proteins, six isoforms of the CapD protein, five isoforms of the AcpA protein, and four isoforms in the AcpB protein. It has been demonstrated that the SNP in the 351A → G position of capC is a marker of A.Br.Aust94 group strains. Based on the results, A.Br.Vollum group strains were divided into two subgroups. The strains in the evolutionary lines B and C differed from the line A strains by the presence of an 853G → A SNP in the acpA gene. In addition, a previously unknown variable number tandem repeat (VNTR), has been found in the acpA gene and the possibility of using it for differentiating and genotyping of B. anthracis strains has been demonstrated.


INTRODUCTION
Climate change scenarios predict an increase in average annual temperatures worldwide. These climatic changes may have important consequences in terms of new microbiological risks. Specific features of the radiation-heat balance make the Arctic more susceptible to climate changes than other regions. The Arctic ecosystem being a reservoir of microbial genetic diversity represents a virtually unlimited source of microorganisms which may interact with humans. Increased temperatures, melting permafrost and sea ice, and the associated biosphere transformations, in particular the acceleration of microbiological processes, could lead to previously geographically restricted diseases suddenly emerging in economically active Arctic zones. There are risks that microorganisms may be released from permafrost, which people may not be prepared to meet with, including anthrax. Little information is currently available on the occurrence of virulent phenotypes in the Arctic zone. Studying microbial phenotypes allows assessing factors influencing the virulence of bacterial strains in the Arctic environment, as well as the potential risks associated with the changes taking place in polar areas in the face of climate change [1].
Anthrax pathogen, Bacillus anthracis, is a sporulating Gram positive bacterium that primarily infects ungulates but can also be transmitted to other warmblooded animals including humans [2].
The distinctive feature of B. anthracis is the presence of two virulence plasmids, pXO1 and pXO2, which determine host specificity and bear the genetic determinants of the key pathogenicity factors [3]. The pXO1 plasmid contains the pagA, lef, and cya genes, which encode anthrax toxin components, namely, the protective antigen, lethal, and edema factors [4]. The рХО2 plasmid contains the capBCADE operon, which encodes the enzymes participating in the synthesis of the B. anthracis capsule made up of poly-γ-D-glutamic acid [5]. Cap operon genes go in the following order: capB, capC, capA, capD, and capE and encode the corresponding B. anthracis proteins, namely, CapB, CapC, CapA, CapD, and CapE [6]. These proteins are presumably responsible for capsule biosynthesis as well as for the transport and attachment of D-glutamic acid residues to the surface of bacterial cells [7,8]. The acpA and acpB genes encoding the AcpA and AcpB proteins with partially overlapping functions are also located on the pXO2 plasmid. AcpA and AcpB independently positively control the transcription of the cap operon and many other genes [5,9]. B. anthracis strains containing both plasmids in their genome are virulent for both humans and animals.
The ability to form spores underlies the key features of the B. anthracis life cycle. As spores, bacterial cells are capable of remaining viable for tens or hundreds of years [2]. Apart from soil foci formation at the sites of anthrax death, there are also risks of obtaining anthrax-infected animal material, such as wool or hides, and spreading it over long distances through human activity. All this has resulted in the general spread of anthrax across areas with highly developed livestock industry. In addition, there is a hypothesis that anthrax spores spread anthropogenically along the routes of livestock movement during the Mongolian military campaigns [10].
Increased risks of anthrax outbreaks in the Arctic zone of the Russian Federation associated with climate change are of particular importance. Long-term persistence of spores in soil and on household items makes anthrax outbreaks (especially given climatic changes) and in certain cases, major epizootics possible, as it was the case in the summer of 2016 on the Yamal Peninsula. This outbreak resulted in the death of more than 2000 reindeer and 1 person, as well as in the illness of 10 more people [10,11]. Another interesting development was the discovery of mummies of cave lion cubs in Yakutia, with at least three strains of B. anthracis having been found at different depths in the soil beneath their discovery site [10].
Climate change makes it necessary to study the underlying causes of outbreaks and to comprehensively investigate this microorganism in order to prevent the spread of the disease. One aspect of this research is the investigation of the possible origins of the strain that caused the disease outbreak. Several molecular diagnostic methods (phylogenetic studies) are used to determine the origin of a microorganism strain.
In the case of anthrax pathogen, it is important to bear in mind that this microorganism emerged relatively recently (approximately 12000-26000 years ago) [12] and spends most of its life cycle as a spore. As a consequence, B. anthracis is characterized by low genetic polymorphism, which makes it impossible to divide it into subspecies and makes phylogenetic studies challenging. In addition, B. anthracis along with other members of the Bacillus genus is a member of the Bacillus cereus group, which includes nine genetically very closely related species [13]. This makes it important to differentiate the anthrax pathogen from other species and to identify the differences between its strains.
Several genotyping methods are now available to draw the genetic profile of a strain. One of the key methods is canSNP genotyping, which involves the identification of nucleotides in 14 positions on the B. anthracis chromosome, the so-called canonical SNPs (canSNPs). Using this method the global anthrax pathogen population was divided into three main evolutionary branches A, B, and C including 14 canSNP groups with some of them being associated with certain geographical locations [14,15]. Multiple Locus VNTR Analysis (MLVA), which is based on identification of the number of tandem repeats in the known VNTR loci, most of which are located on the chromosome, may be named as the second most used method [16]. Several loci located on the plasmids have been suggested for use despite their short repeat length of two to three base pairs (bp), which makes it necessary to use expensive equipment to determine the number of repeats in these loci. Together, these methods make it possible to rapidly obtain information on the phylogenetic status of the strain in question with high precision [14].
Since B. anthracis is a pathogenic microorganism, it is only logical that its pathogenicity factors should be studied to the fullest extent. For example, the role of different amino acid residues in pathogenicity factor protein sequences and the polymorphism of genes encoding these factors in the global population of this species is one of the important research directions. This approach would not only contribute to the study of pathogenicity factor protein structures but would also allow the obtained data to be used to develop methods for bacterial strain genotyping using the polymorphisms in the studied genes. There are currently no studies that would describe the polymorphism of genes encoding pathogenicity factors located on the рХО2 plasmid in B. anthracis strains.
Objective-To describe polymorphism of the cap-BCADE operon genes encoding capsule biosynthesis proteins and the acpA and acpB genes encoding capsule biosynthesis regulation proteins using whole genome sequencing data of B. anthracis and to use the Bioinformatics analysis. Whole genome sequencing reads were assembled into the genome nucleotide sequences with the aid of the Lasergene software package (Dnastar, United States) (dnastar.com). The genome of the B. anthracis Ames Ancestor strain (GenBank: GCA_000008445.1) was used as a reference genome.
Searching for the nucleotide sequences from the B. anthracis and B. cereus strains, whose genomes are deposited in GenBank, was performed using BLAST (http://www.ncbi.nlm.nih.gov/blast).
Translation in silico of nucleotide sequences and multiple alignments were carried out with the aid of the MEGA 7.0 software package (http://www.megasoftware.net).
Phylogenetic (cluster) analysis was performed in MEGA 7.0. Phylogenetic trees (dendrograms) were constructed using the unweighted pair group method with arithmetic mean (UPGMA). Joined in silico nucleotide sequences of the studied genes were used for phylogenetic analysis. MLVA genotyping. MLVA genotyping was performed according to the protocol described in [17] using primers designed in the present study.
PCR to amplify the VNTR locus was carried out using the Taq DNA Polymerase kit (Thermo Fisher Scientific, United States). PCR mixture contained 10× PCR buffer, 2.5 mM MgCl 2, 0.2 mM dNTP, 10 pmol of each primer, 1 U of Taq DNA polymerase, 1 μL of DNA solution in TE buffer for each of the studied strains (5-20 ng), and Н 2 О to get a final volume of 25 μL. A TE buffer was used as a negative amplification control. PCR was carried out in a Т100 Thermal Cycler (Bio-Rad, United States).
The length of the amplified fragments was determined by gel electrophoresis. Results were obtained by counting the number of repeats at the VNTR locus. Electrophoresis was performed in 3% agarose gel (Helicon, Russia) in 0.5× TBE-buffer with subsequent EtBr staining (100 mg/L). Results were registered using the ECX-F15.L transilluminator (Vilber Lourmat, France), Vzglyad gel documentation system (Helicon, Russia), and IC Measure software (Imaging Source Europe GmbH, Germany) at 254 nm.
Amplified fragment sizes were determined using the EZ Load 20 bp Molecular Ruler ladder (Bio-Rad, United States) with the aid of the PhotoCamptMw 99.04 software program (Vilber Lourmat, France).

RESULTS AND DISCUSSION
Allelic polymorphism. A total of 77 strains of anthrax pathogen were studied, including strains isolated at the site of a major reindeer epizootic on Yamal Peninsula (Russia) in the summer 2016 and the strains isolated from permafrost in Yakutia [10,11]. B. cereus strains containing pXO2-like plasmids (B. cereus bv. anthracis CI and BC-AK) were also included in the sample. These strains are capable of causing anthraxlike disease in humans and great apes [18][19][20].
For the strains in the studied sample, the sequences of genes located on the pXO2 virulence plasmid, namely, capBCADE operon genes (capB, capC, capA, capD, and capЕ) responsible for B. anthracis capsule synthesis and acpA and acpB genes encoding capsule biosynthesis regulation proteins were assembled using the obtained whole-genome sequencing data.
The results allowed us to identify and describe mutations and to split the sample into sequence types (STs). In total, nine STs for the capD gene  which consists in the presence of certain mutations, are listed in Tables S1-S6. For each gene, the sequence type to which the Ames Ancestor reference strain belongs is designated ST1. Type numbering was performed in decreasing order of the number of B. anthracis strains included within it, followed by the ST to which the B. cereus strains belonged. All STs differed between themselves by nucleotide substitutions or the insertion, as is the case with the acpA gene.
To assess the phenotypic expression of the identified nucleotide polymorphism, or in other words, whether the nucleotide substitution at each identified position is synonymous or results in an amino acid substitution in the corresponding protein, or protein inactivation due to a stop codon, in silico translation of the nucleotide sequences into amino acid sequences was carried out (Tables S7-S12). The coordinates of the amino acid substitutions are given for complete translated protein sequences, disregarding their posttranslational modifications, which have been described for the CapD protein only and consists in the cleavage of the 28 amino acid N-terminal signal sequence [21]. 3D structures of the proteins encoded by the genes studied here have not been described with the exception of CapD [21,22], therefore amino acid substitutions are shown without indication in which protein domain they are located.
The distribution of the strains in the studied sample over the translated CapB, CapC, CapA, CapD, AcpA, and AcpB protein types is given in Tables S13-S18. The assignment of strains to protein types was carried out in the same way as it was done in the case of nucleotide STs. In total, six CapD protein isoforms, five AcpA protein isoforms, four AcpB protein isoforms, three isoforms of each of the CapB and CapA proteins, and two isoforms of each of the CapC and CapE proteins were described.
The tables do not contain data for the capE gene since the studied sample appeared to be monomorphic with the only exception for the B. cereus BC-AK strain, which revealed the presence of two SNPs, namely, 49T → G (17L → V) and 138C → T (synonymous). The observed low variability may be partially accounted for by the small length of the capE gene (only 144 bp), as given the same frequency of mutation across the entire cap operon, the probability of mutation in such a small part of it is rather low.
The distribution of strains in the studied sample across genotypes (GTs), each of them including a particular combination of ST genes under study, is shown in Table 2. The genotypes were numbered according to the principle specified above. GT, which included the reference strain was designated GT1. The remaining GTs were numbered in decreasing order of the number of strains in which they were described, the last numbers having been assigned to the GTs characteristic of B. cereus. A total of 17 GTs (D = 0.8908 [0.8604-0.9212]) were described.
A phylogenetic tree showing the relationships between the identified GTs is shown in Fig. 1.
In summary, the present study described the allelic polymorphism of B. anthracis capsule formation genes and cap-operon expression regulation genes located on the pXO2 plasmid. Their STs were identified and strains in the study sample were clustered based on their GT, which represented a set of STs for individual genes. In other words, we virtually used multi-virulence-locus sequence typing (MVLST) to study the anthrax pathogen. Earlier, this method was first used by the authors to study this microorganism using the sequences of toxin formation genes located on the pXO1 plasmid. This pipeline was called MVLST pXO1 genotyping. Accordingly, the genotyping pipeline suggested in this study may be called MVLST pXO2 , while the identified GTs, may be referred to as MVLST pXO2 -GT.
Analyzing the studied sample clustering based on MVLST pXO2 -GTs demonstrated that MVLST pXO2 profiles correlated well both with the species identity (they were different for B. anthracis and B. cereus) and evolutionary line divergence within the B. anthracis species. However, using MVLST pXO2 we were not able to accurately divide the sample into the groups corresponding to the canSNP groups that the strains belong to. At the same time [24] reported a strong correlation between MVLST pXO1 -GT of the strain and the canSNP group, which it belongs to, as well as, in some cases, their association with a geographical zone. It should be noted that MVLST pXO2 genotyping based on gene sequences localized on the pXO2 plasmid had lower resolution (D = 0.6148 [0.488-0.7416]) compared to the similar approach, which used genes located on the pXO1 plasmid (D = 0.8951 [0.8656-0.9245]) [24].
In the A lineage, MVLST pXO2 -GT1 is the most common genotype, which includes strains belonging to almost all canSNP groups. Other GTs in lineage A differed from GT1 by one or two SNPs and contained a single strain or combined small subgroups of strains within the same canSNP group.
The exception was the Korean H9401 strain (canSNP group A.Br.005/007), which differed from MVLST pXO2 GT1 by two SNPs, namely, capD 796G → A (266V → I) and acpB 495A → G (synonymous). However, it is rather doubtful that these two markers are specific to the A.Br.005/007 group, as they are just as likely to be specific to the H9401 strain [25]. The presence of these two mutations in other strains in the A.Br.005/007 group could not be verified because at the time of preparing the manuscript, no whole genome sequencing data were publicly available for the strains in this canSNP group.
Apart from the H9401 strain, two more strains showed the presence of unique markers, namely, the Tangail As stated above, in several cases, not only individual strains, but also small groups of strains within canSNP groups in lineage A showed unique markers. For example, MVLST pXO2 -GT3 genotype included four strains belonging to the A.Br.Aust94 group, which were isolated in the Caucasus region, namely, in the Stavropol krai, Chechen Republic, and Azerbaijan. These strains form an individual GT based on the presence of the unique synonymous SNP capC 351A → G (capC-ST2). However, this marker is absent in strain 1199 in the same A.Br.Aust94 group isolated in the same region, namely, in Dagestan.
To verify whether this capC 351A → G SNP is specific for the "Caucasian" A.Br.Aust94 strains, its presence was additionally tested in the genomes of 23 strains in this group isolated in different areas located outside the Caucasus region (Table 3). We were not able to detect this SNP in any of the strains included in this additional sample. Thus, it may be suggested that the A.Br.Aust94 capC 351A → G sub-group was formed in the Caucasus region and circulate there together with its "parental" form, and the capC 351A → G SNP may be used as a marker the presence of which in the strain indicates its region of origin.
The A.Br.Vollum canSNP group was distributed between three GTs in the studied sample. Two strains isolated in Central Asia belong to GT1, a common one for lineage A. Four American strains belonging to GT5 (n = 3) differed from GT1 by the presence of the unique acpB 1381 A → G (461I → V) SNP. GT5 is characterized by the presence of only this substitution. GT12 (n = 1) contains another unique SNP acpB 563C → T (188S → L) in addition to this substitution. In a further search for the presence of these SNPs, we identified nine strains in the additional A.Br.Vollum sample with different geographical origins belonging to GT1 (n = 4) and GT5 (n = 8) (Table 4). Thus, we may at least say that the A.Br.Vollum group has split into two subgroups with different GTs: GT1 (which includes both strains in this group which were isolated in the former Soviet Union) and GT5. GT12 is apparently a strain-specific genotype. Markers characteristic of the B and C lines are worth mentioning separately. It has been demonstrated that all the studied strains in lineages B and C differ from lineage A strains by the presence of the common SNP acpA 853G → A.
The strains in the B.Br.CNEVA canSNP group isolated in Central Europe have a synonymous capD 234T → C SNP. However, the only strain in this group originating from the SRCAMB (Obolensk, Russia) did not reveal the presence of this mutation. Unfortunately, there is no data left on where this strain was isolated, so the presence or absence of this capD 234T → C SNP cannot be considered as a conclusive marker of geographical origin. At the same time, the presence of this strain in SRCAMB (Obolensk, Russia) still very likely indicates that this strain was isolated in the former Soviet Union. In this case there may be some reason to assume that this SNP does have phylogeographic relevance. However, strain 44 is the only available strain in the B.Br.CNEVA group for which the isolation site is not in Central Europe, which made it impossible to carry out any additional study on the distribution of the capD 234T → C SNP among B.Br.CNEVA group strains isolated outside this region.
Evolutionary lineage С (canSNP group C.Br.001) is represented in the studied sample by two strains (2002013094 and 2000031021) isolated in the United States. These strains have the same nucleotide sequences for all the genes studied and therefore fall into the same GT according to the results of phylogenetic analysis of joined sequences. According to the results of the analysis carried out in this study, these strains proved to be phylogenetically closer to B. cereus than the other strains in the studied sample. The following SNPs are common between these strains and nonanthracis (B. cereus) strains: One of the aims of this study was to examine the anthrax pathogen as an infection, whose outbreaks may potentially be triggered by climatic changes, such as permafrost melting leading to the growth of spores previously trapped in frozen soil layers [10]. Previously, several researchers have suggested the possibil- ity of outbreaks of several diseases, including anthrax, caused by climatic changes [26][27][28]. Strains isolated during the Yamal outbreak, as well as one of the strains isolated from the frozen soil layers in Yakutia, were previously assigned to evolutionary lineage B. Several mutations, which are phy-logenetically significant markers, have been described in the genomes of these strains here and in the earlier studies [23,24]. The discovery of anthrax bacilli in the Holocene alluvial deposits in Yakutia is of much interest [10]. The most likely genetic dating, though, indicated that these strains were conserved between  the 13th and 16th centuries. The geological position of the find below the seasonal thawing layer may probably indicate that they are several thousand years old. If the latter is true, then it appears that anthrax bacteria (apparently, as spores) are able to prevent the accumulation of mutations in accordance with the molecular clock hypothesis during their longterm preservation in permafrost leading to the effect of "curiously modern DNA for an ancient bacterium" [29]. The study of such strains is valuable not only in terms of the risks of outbreaks resulting from climate change, but also for better understanding of the evolutionary history of anthrax pathogen and other bacilli. MVLST pXO2 genotyping of 40 B. anthracis and 2 B. cereus strains, whose genomes are deposited in GenBank (Table 1) revealed the presence of a nine bp insertion in the acpA gene in several strains belonging to lineages В and С, which at closer look proved to be part of an imperfect tandem repeat ATA**GATA. It has been demonstrated that the number of tandem repeating units is three in evolutionary lineage A strains, while it is four in lineage B strains belonging to the B.Br.001/002 (n = 3) and B.Br.CNEVA (n = 4) groups and in lineage С. In such a way, we identified a previously unknown VNTR locus on the pXO2 plasmid of the anthrax pathogen genome, which we named VNTRacpA.
The structure of tandem repeats itself determines the specific nature of the most likely mutations at these loci, which are insertions and deletions of the certain repetitive sequence. Furthermore, the frequency of these events is significantly higher than the average frequency of other mutations in the genome [30,31]. Therefore, the presence of only two alleles of this tandem repeat in the acpA gene in the sample of 42 phylogenetically different strains suggests its low variability. It may be assumed that four repeats of this motif were characteristic of the B. anthracis 'ancestral genome' prior to its segregation into geographically and genetically distinct groups. It is possible that reducing the repeat fold to three resulted in an AcpA protein variant, which provided the corresponding strain with some selective advantage, for example, more efficient regulation of pathogenicity factor expression, with the archaic four-repeat variant being preserved only in the B and C strains (Table S5).
Previously, several VNTR loci, including those located on the pXO2 plasmid, have been suggested for MLVA genotyping of the anthrax pathogen. However, the loci located on this plasmid represent clusters of 2-3 bp long repeats, with expensive equipment thus being needed to determine their length [32]. The locus described here represents a region containing three or four 9 bp long tandem repeats, which makes it possible to separate the amplified fragments in the agarose gel.
The results showed that the identified polymorphism allows using the number of repeats at this locus as a diagnostic marker to differentiate between the evolutionary lineages. Therefore, we assessed the possibility of using VNTRacpA for MLVA genotyping in this study.  Unfortunately, the method of nucleotide sequence assembly using whole genome sequencing data employed in this study did not allow the number of repeats in the VNTRacpA region to be unambiguously determined in the strains from SRCAMB (Obolensk, Russia) including those belonging to lineage B. For this reason we designed PCR primers flanking the described repeat region and tested them on the DNA isolated from the SRCAMB (Obolensk, Russia) B. anthracis strains included in the studied sample. We used the MLVA genotyping approach, which involved the amplification of the VNTRacpA locus and the amplified fragments separation in the agarose gel. Figure 2 shows an example of using the VNTRacpA locus for MLVA genotyping of B. anthracis strains and differentiating evolutionary lineages А and В. The results allowed us to differentiate between the strains in evolutionary lines A and B with high confidence and confirmed the hypothesis of the presence of two tandem repeat alleles, specific to the two evolutionary lines of the anthrax pathogen. However, the analysis revealed that the theoretical length of the amplified fragments differed from the observed one. In particular, the theoretical length of the fragment containing three tandem repeats, was 245 bp, and of the fragment containing four repeats, 254 bp (Fig. 2). The observed lengths were 300 and 309 bp.

CONCLUSIONS
To summarize, our findings demonstrate that the segregation of the studied sample into the MVLST pXO2 -GTs corresponds to its segregation into the main B. anthracis evolutionary lineages. The capC 351A → G SNP is the marker of the A.Br.Aust94 group strains circulating in the Caucasus region. MVLST pXO2 genotyping subdivided the A.Br.Vollum group strains into two subgroups one of which was isolated in the former Soviet Union. The absence of the capD 234T → C SNP in the B.Br.CNEVA group strains may be a possible marker indicating that the strain originated from the former Soviet Union. The possibility of using the VNTRacpA locus to differentiate between anthrax pathogen strains belonging to evolutionary lines A and B by MLVA genotyping was demonstrated. This approach enables the results to be obtained in a short time and allows avoiding errors associated with the assembly of sequencing results. The described genotypes allow evaluating factors associated with microorganism virulence under certain environmental conditions and identifying potential risks caused by climate change.

FUNDING
The study was supported by the Sectoral Program of the Federal Service for Supervision of Consumer Protection and Welfare.

COMPLIANCE WITH ETHICAL STANDARDS
The authors declare that they do not have conflicts of interest. No experimentation involving animals or humans was performed by any of the authors.

SUPPLEMENTARY INFORMATION
The online version contains supplementary material available at https://doi.org/10.1134/S0003683822040056.

OPEN ACCESS
This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.