Genetic and genomic diversity of NheABC locus from Bacillus strains

Non-hemolytic enterotoxin (NHE), a tri-partite, proteinaceous toxin encoded by contiguous nheA, nheB and nheC genes of Bacillus cereus sensu lato (B. cereus s.l.), is considered to be associated with the foodborne diarrheic syndrome. However, B. cereus s.l. includes a number of closely related strains, and the occurrence of NHE among them, and other members of Bacillus is unclear. Consequently, we aimed to determine the distribution and evolution of NHE within Bacillus by confirming the presence of the nheA, B and C sequences and variation within them using published data, and to analyze the genomic and genetic diversity. The phylogenetic tree of NHE proteins (NheA, NheB and NheC) from 81 different B. cereus s.l. strains was constructed. And on the genetic determinants of the NHE toxin did not bring any obvious link between the nheABC genes sequence of a strain and its virulence in the diarrhoeal pathogenesis. Analysis of the genomic diversity of the nheA, B and C loci revealed that their upstream regions were more conserved than the downstream sequences. Multilocus sequence typing schemes (MLST) based on seven concatenated housekeeping genes and nheA, B and C genes of the 75 strains were developed. The neighbor joining phylogenetic tree based on seven housekeeping genes together with nheA, B and C genes was similiar to published Bacillus phylogenetic trees. And on the genetic determinants of the NHE toxin did not bring any obvious link between the nheABC genes sequence of a strain and its virulence in the diarrhoeal pathogenesis.The results indicate that nheA, B and C genes do not affect the diversity of housekeeping genes, and this specific NHE protein does not participate in the diarrheic syndrome.


Introduction
Bacillus cereus s. l. is widely distributed in food, soil and plants (Okinaka and Keim 2016). This Gram-positive, spore-forming bacterium may behave as an opportunistic human pathogen. Long known to be responsible for two forms of food poisoning, characterized by either diarrhea or nausea and vomiting. The diarrhoeal symptom includes the following symptoms, which usually last generally less than 48 h: abdominal pain, profuse watery diarrhea, sometime nausea and vomiting within 8-16 h. Although most cases are generally mild, more serious and even lethal cases have been reported in Europe.
Concerning the diarrhoeal syndrome, no definitive hypothesis that correlates the symptoms to a unique component exists to date. Indeed, several putative enterotoxins have been reported to be potentially responsible, alone or in combination, of the diarrhoeic pathotypes. These include the tripartite enterotoxins as Haemolysin BL (HBL) and Nonhaemolytic enterotoxin (NHE), but also the singlecomponent toxin, Cytotoxin K (CytK) also sometimes named Haemolysin IV (HlyIV). In addition to these three major candidates, other molecules are also cited as potential enterotoxins involved in the diarrhoeic syndrome, such as enterotoxin FM (EntFM) (Boonchai et al. 2008), enterotoxin S (entS), enterotoxin-T (bceT) and pore-forming haemolysins like the Cereolysin O (CerO), Haemolysin II (HemII) and Haemolysin III (HlyIII). Besides, other virulence factors seem to contribute to the B. cereus foodborne diseases, such as phospoholipases or the sphingomyelinase (SMase).
The non-hemolytic enterotoxin NHE is encoded by nheA, nheB and nheC . NHE is a tripartite pore-forming toxin that requires the combination of three proteins: NheA, NheB and NheC. NHE was first isolated from the supernatant of a B. cereus s.l. culture that caused a large food poisoning outbreak in Norway in 1995 (Lund and Granum 1996). NHE proteins are secreted independently and maximal toxic activity on Vero cells requires all the three parts in a molar ratio 10:10:1 of NheA, NheB and NheC, respectively. NheB is the binding component of the enterotoxin complex and an increase in the concentration of NheC results in a decrease in Nhe toxic activity (Lindbäck et al. 2004).
However, the genes coding for these putative enterotoxins are, for the most part, largely distributed among the B. cereus group isolates, irrespective of their diarrheic activities (McIntyre et al. 2008;Swiecicka et al. 2006). Moreover, with the exception of the rabbit ileum assay, no animal model can be used to specifically test for the diarrhoeal properties of the strains (or for purified proteins) (Beecher et al. 1995). Only classical assays on animal cell lines are readily available, but they only give information on the generic cytotoxicity of these putative enterotoxins (Jeßberger et al. 2014).
B. cereus s.l. includes a number of closely related species, viz. Bacillus cereus sensu stricto, Bacillus mycoides, Bacillus pseudomycoides, Bacillus thuringiensis, Bacillus weihenstephanensis, Bacillus anthracis and Bacillus cytotoxicus, and many strains thereof (Lindbäck et al. 2004). The occurrence of NHE within the group, and other members of Bacillus is unclear, e.g., we have found NHE in some non-B. cereus s.l. members. Analysis methods such as MLST indexes the sequence variation present in a small number (usually seven) of housekeeping gene fragments located around the bacterial genome, is to provide a highly discriminating typing system that can be particularly helpful for the typing of bacterial pathogens (Keith and Martin 2014). And PHYLOViZ software make the data easy to be visualized and export the results in graphic formats (Alexandre et al. 2012). In this study, we were interested in determining the particular strains that produce NHE and to trace the molecular evolution and variation of the nhe-ABC genes and the phylogenetic relationship of the various Bacillus cereus s.l. strains to others within Bacillus. Our approach involved sequence-based typing analysis, interrogation of online databases of allelic profiles and associated epidemiological data collected.
With the aim of assessing the potential implication of NHE in the diarrheic syndrome, the genomic and genetic diversity, as well as the occurrence and the evolutionary ecology of nheABC were studied in detail on a collection of Bacillus cereus strains.

Bioinformatics and strains information
The presence of nheABC genes and their corresponding putative sequences was screened in the NCBI database among the 174 genomes from the B. cereus group as available (Aug. 15, 2015). The NheA, B and C protein sequences from B. cereus ATCC14579 were applied to find all the homologous proteins among the 174 strains by BLAST program in http://blast.ncbi.nlm.nih.gov/Blast.cgi.
A set of strains (Table 1) coming from food poisonings (FP), food products (F), the environment (E), clinical cases (C) and unknown origin (U) was collected to investigate the presence of nheABC genes. Then, a panel of positive strains was selected for MLST analysis (Table 2 and below). This selection was based on the origin, the year and the country of strains and the species to obtain the most diversified panel of nheABC positive strains.

MLST analysis
Seven loci encoding housekeeping genes were chosen for MLST analysis: glp (glycerol kinase), gmk (guanylate kinase), ilv (isoleucine-valine), pta (phosphate acetyltransferase), pur (purine synthesis), pyc (pyruvate carboxylase), tpi (triosephosphate isomerase) (Lampe and English 2016). 75 strains for which the nheABC genes are available in the genome databases, were selected for the MLST analysis ( Table 2). The MLST result were shown in minimum-spanning tree (http://pubmlst.org/analysysis/). The nucleotide sequence diversity of glp, gmk, ilv, pta, pur, pyc, tpi and nheABC genes was analyzed at two levels: by constructing a phylogenetic tree for each loci with the CLC Main Workbench 7 software (ClCbio, a Qiagen Company) using the neighbor joining (NJ) algorithm with Jukes Cantor as substitution rate model, and by building sequence types (ST) of the various strains using the non-redundant database (NRDB) for allele comparison (http://pubmlst.org/).
To cluster the strains according to their ST, BURST analysis was performed which defines a group when at least 5/7 loci were identical (http://pubmlst.org/perl/mlstanalyse/). Sequences of the seven chromosomal loci were also concatenated with or without adding the nheABC genes to construct and compare the respective phylogeny trees generated by CLC Main Workbench 7 software using UPGMA algorithm and Kimura 80 mathematical model using. The correctness of the results was evaluated using a 100-step bootstrap test (Virginie et al. 2015).

Results and discussion
nheABC occurrence among the B. cereus s.l. strains While the mechanisms involved in the B. cereus diarrhoeal pathogenesis are still largely unknown, a large panel of enterotoxins have been designated as potential causative agents led to this syndrome. And NHE is regularly cited as a candidate because of its cytotoxic, necrotic and haemolytic activities on human intestinal cell lines (Lindback et al. 2004;Zhu et al. 2016). Some B. cereus genomes harbor the nhe operon which codes for the cytolytic protein NheA and the binding components NheB and NheC (Wehrle et al. 2009 (Table 1). The nheABC operon occurrence in NCBI database is 46.6% (81/174). And the mean value frequency of nheA, nheB, nheC found in the literature is 82, 81 and 78%, respectively (Swiecicka et al. 2006;Gaviria et al. 2000;Hansen and Hendriksen 2001;Banerjee et al. 2011;Moravek et al. 2004;De Jonghe et al. 2010;Krause et al. 2010;Zhou et al. 2010;Samapundo et al. 2011;Chon et al. 2015). These frequencies are slightly higher than the ratio we measured from database. Maybe because more and more Bacillus genomes sequencing are completed, and nheABC operon is absent from the recently released genomes, so the ratio measured from database is low.
Genomic diversity of nheABC 81 genetic regions (30 kb in size) from the nheABC + strains mentioned above centered on the nheABC operon   were collected and aligned for genomic diversity analysis. The putative ORFs (more than 200aa) from all fragments were annotated (Table 3), and compared to illustrate their genetic features. By using the nheABC locus of B. cereus ATCC14579 as a typical reference, good conservation was observed in the upstream region of this gene cluster among the other 76 strains. Interestingly, the rest four strains showed different pattern of conservation, therefore, we studied the genomic diversity in two parts.
In the first part, 30 kb DNA sequences centered on the nheABC loci from 76 strains were aligned and labelled in the sketch map. The alignment result showed a higher degree of conservation in the upstream region in terms of gene content, relative to the downstream region (Fig. 1) In the nheABC locus upstream regions (left part of nheABC loci in Fig. 1), seven genes (two-component response regulator vanR, C terminus of sensor histidine kinase, M24/M37 family peptidase, manganase transport protein MntH, sulfur transferase, hypothetical protein Detailed allelic profiles for the seven housekeeping genes (glp, gmk, ilv, pta, pur, pyc, tpi) are given for the ST (Sequence Type). The sequences of strains are from NCBI databases. Numbers were arbitrary assigned to allele fragment for each locus. The STs were grouped by BURST analysis: 75 strains were divided into seven groups based on the number of differences in the allelic profiles (Table 1) Abbreviations are as follows: C clinical isolates, E environmental isolates, F food isolates, FP food poisoning isolates, U strains with unknown origin ST stands for Sequence Type and corresponds to the specific allelic profile and 2-dehydropantoate 2-reductase) were present in all the 76 fragments. Interestingly, in 49 of the 76 strains, a 505aa ORF (amino acid permease) was found immediately upstream of the NheA, while it was broken up into two smaller ORFs (also annotated as amino acid permease) in the 32 remaining strains, including which included 31 B. anthracis strains plus B. thuringiensis 97-27 strain.
In the nheABC locus downstream regions (right part of nheABC loci in Fig. 1), there was obvious genes constituent change and rearrangement in different branch. The gene rhtB (homoserine/threonine efflux protein) was conserved in Branch A-D, but disappeared in Branch E and F. The four genes cluster deoR (deoxyribonucleoside regulator), deoC (deoxyribose-phosphate aldolase), nupC (nucleoside transporter), pdp (pyrimidine-nucleoside phosphorylase) was well kept in the same direction in Branch B-F. And the gene yndJ encoded a putative membrane protein also maintained in Branch D and E. The other genes showed different variety. In the downstream regions, the composition and arrangement of genes diversified in different Branch strains.
But the rest five nheABC + strains, B. cereus ATCC4342, D17, FT9, G9241 and B. cytotoxicus  and B. mycoides Rock 3-17 had almost the same genes distribution. It is easy to recognize gene cluster rearrangement and gene insertion in B. anthracis Ames and B. cereus NVH391-98, though these two strains shared high similarity in most genes rearrangement.
NheB genetic diversity seemed more conservative. The size of 79 NheB proteins was 402aa, and the size of the rest two strains (B. cytotoxicus NVH391-98, B. cereus CI) was 401aa. Based on sequences diversity, all the 81 NheB sequences were divided into three groups (Fig. S2). In group A, the NheB from different strains shared 99-100% identities with B. cereus ATCC14579-NheB. In group B, the NheB from strains B. mycoides ATCC6462 shared 98% identity with B. cereus ATCC14579-NheB. In group C, the 401aa NheB proteins from B. cereus NVH391-98 shared 87% identity with B. cereus ATCC14579-NheB.
NheC genetic diversity analysis showed much less conservation, compared to NheA and NheB. The NheC size from the 81 strains ranged from 305aa to 397aa. Most of the proteins (74 out of 81) were 359aa. The size of the rest seven strains, such as Ba Vollum, CDC684,Han,B. cereus 4342,FT9 and B. thuringiensis AI Hakam was 305aa,305aa,353aa,353aa,362aa,362aa,397aa and 362aa, respectively. Based on sequences diversity, all the 81 NheC were divided into three groups (Fig.  S3). In group A, the NheC from different strains shared 94-100% identities with B. cereus ATCC14579-NheC. In group B, the NheC proteins from Bw WSBC10204 and KBAB4 both shared 92% identity with B. cereus ATCC14579-NheC. In group C, the NheC from Ba Han shared 86% identity with B. cereus ATCC14579-NheC. In group D, the NheC protein from B. cereus NVH391-98 shared 73% identity with B. cereus ATCC14579-NheC.
In all the analysis of NheA, B and C genetic diversity, the strain B. cereus NVH391-98 was always exclusively different from other strains. This is in agreement with the observations performed on the genomic diversity of nhe-ABC loci (Böhm et al. 2015).

Seven housekeeping genes MLST analysis from 75 nheABC + strains
To further characterize the sequence variation among the 75 nheABC positive strains of B. cereus, MLST was performed using the seven housekeeping genes: glp, gmk, ilv, pta, pur, pyc and tpi. The sequence variability of each locus was also studied in details. Based on the allelic profiles of the seven loci, ST could be defined for all the isolates. The high number of ST is another illustration of the high diversity existing among the nheABC positive B. cereus s.l. strains. Of note, Table S1 displays the correspondence between the allelic profiles defined in this work and those reported in previous MLST schemes (Helgason et al. 2004;Sorokin et al. 2006). The 75 isolates (Table S1) were then subjected to a BURST analysis to group the strains according to the similarity of their allelic profile. The isolates were grouped together when five out of the seven analyzed loci were identical. Based on this criterion, seven clusters were formed, as shown in Table 2. 75 strains were clustered in seven groups. Strains coming from foodborne outbreaks spread in different groups. This result showed that there was no obvious association between similar allelic profiles and geographic or source origins.
To visualize the influence of nheABC gene on the relatedness of the strains, 75 strains displaying nheABC positive (originating from different countries) were selected for the MLST analysis of the Bacillus strains. A tree was built with the sequences of the seven loci concatenated with nheABC. PHYLOViZ was built upon the goeBURST implementation (available at http://goeburst.phyloviz.net) and it allows to integrate and display multiple sources of information. The result was showed in Fig. 3. The whole tree was built on all the selected species about 2033 Bacillus cereus strains from different sources (data not shown), and the 75 selected strains assigned black color were located in this split tree. Though the observed frequency of each strain was different, all the selected strains spread randomly in all the part of MLST split tree. The number of locus differences between each pair strains could be observed from the line distances of the tree. This explained that the nheABC sequences variation had no favor of any type of strains, and also showed  Table S1, the blue and green circles were 2033 Bacillus cereus strains (data not shown). The numbers in the circles were ST of each strains. a the whole tree. b the partial enlargement of A again that there was no obvious association between similar allelic profiles and geographic or source origins.
To verify that nheABC operon was not prone to lateral transfer, 30 kb genome fragments centered on the nheABC operon locus were investigated. Although the genomic neighborhood of nheABC showed some variability, no element potentially involved in horizontal transfer was found (data not shown). The comparison of the 76 Bacillus fragments has provided valuable insights into defining the key genetic complement of the organisms, which forms the basic genetic support to define the organism's pathogenicity ability (Lisdawati et al. 2015).
To investigate the genetic diversity of nheABC and seven housekeeping genes (glp, gmk, ilv, pta, pur, pyc and tpi), the concatenated sequences of the seven housekeeping genes (I), nheA, B and C genes (II), the concatenated sequences of the seven housekeeping genes plus the nheA, B and C genes (III) were collected from 35 B. cereus s.l. group strains for which the seven genes sequences were all available in NCBI database. The sequences were then analyzed by online software clustalw (http://www.genome.jp/ tools/clustalw/). The neighbor joining trees based on seven housekeeping genes together with nheA, B and C genes were built. In tree I (Fig. 4), B. cereus NVH391-98 was far away from all the other 35 strains. Genes from different strains clustered together, according to the genetic relationship five main clusters are noted as A-E. In tree II and III, though the order of five main clusters changed, the constituent of each cluster stayed the same. The structure of these trees were similiar to published Bacillus phylogenetic trees (Virginie et al. 2015). This result indicated that no striking differences was observed between the various trees, and meant that the influence of nheABC on strain clustering is limited. This result also showed that the genetic determinants of the NHE had no any obvious relationship with the nheABC genes sequence of a strain and its virulence in the diarrhoeal pathogenesis.
In this study, we found that the genetic determinants of the NHE toxin did not bring any obvious link between the nheABC genes sequence of a strain and its virulence in the diarrhoeal pathogenesis. To assess whether NHE is a significant factor in this disease, a transcriptomic study should be considered to take the genes expression of the toxin into account. And the potential action of NHE also should be investigated in concert with other possible enterotoxins (e.g., HBL, CytK or HlyII) and other virulence factors to evaluate the diarrhoeic potential of B. cereus strains. To elucidate the actual involvement of these molecules in the diarrhoeal syndrome, it is necessary to find an adequate animal model. In fact, due to their proteinaceous nature, these putative enterotoxins may be prone to a rapid inactivation in the intestinal tract, unless they would be released by the B. cereus cells in the immediate vicinity of host's intestinal epithelium being protected by the mucus layer. However, this hypothesis has still to be verified.

Conclusions
The nheABC genes do not affect the diversity displayed by housekeeping genes, and this specific protein is probably not implicated in the diarrheal syndrome. Our data provide Fig. 4 The NJ trees built based on the concatenated sequences of the six housekeeping genes and nheABC genes. I (based on six housekeeping genes), II (based on nheABC genes) and III (based on the concatenation of six housekeeping genes and nheABC genes) a scientific basis for us to know more about nheABC loci in food poisoning B. cereus.