Introduction

Bacillus thuringiensis is an ubiquitously distributed, rod-shaped, Gram-positive, spore forming, facultative anaerobic bacterium [1, 2]. Bacillus thuringiensis has been isolated from various ecological niches, including soil, aquatic habitats, phylloplane and insects [3,4,5,6,7]. The defining property of the species is the ability to produce parasporal protein crystals consisting of δ-endotoxins, which are predominantly encoded on plasmids [1, 8, 9]. These proteins are toxic towards a wide spectrum of invertebrates of the orders Lepidoptera , Diptera , Coleoptera , Hymenoptera , Homoptera, Orthoptera , Mallophaga and other species like Gastropoda , mites, protozoa and especially nematodes [7, 10,11,12]. In addition, B. thuringiensis produce additional toxins such as Cyt, Vip, and Sip toxins [13]. Cry toxins represent the largest group and can be subdivided into three different homology groups. In total, over 787 different Cry toxins have been identified, each exhibiting toxicity against a specific host organism [14]. It has been shown that B. thuringiensis strains can produce more than one Cry toxin resulting in a broad host range. As such, B. thuringiensis has been used widely as a biopesticide in agriculture for several decades [1, 2, 8, 13, 15, 16]. Bacillus thuringiensis is a member of the genus Bacillus , which are low GC-content, Gram-positive bacteria with a respiratory metabolism and the ability to form heat- and desiccation-resistant endospores [11, 17, 18]. Within this genus, B. thuringiensis is a member of the Bacillus cereus sensu lato species group which originally contained seven different species ( B. cereus , B. anthracis , B. thuringiensis , B. mycoides , B. pseudomycoides , B. weihenstephanensis , B. cytotoxicus [17,18,19,20,21,22,23,24,25]). Historically, most pathogenic and phenotypic properties were used for strain classification. However, recent publications utilizing genomic criteria suggest that the species group should be extended by species B. toyonensis [26, 27]. Moreover, the three proposed species Bacillus gaemokensis ”[28], Bacillus manliponensis ”[29] and Bacillus bingmayongensis ” [30] have been isolated and effectively published. However, these names had not yet appeared on a Validation List at the time of pulbication [31]. Due to the very close phylogenetic relationships, it has also been proposed to assign the eleven species to a single extended Bcsl species [32, 33]. The genome of Bcsl-members contains a highly conserved chromosome with regard to gene content, sequence similarity and genome synteny, while variation can be observed within mobile genomic elements such as prophages, insertion elements, transposons, and plasmids [34]. Due to the significance of Bcsl group members in human health, the food industry and agriculture, resolving the phylogeny is of great importance. Because of the highly conserved 16S rRNA-genes, the classical 16S phylogeny of Bcsl strains is inconclusive. Thus, a combination of 16S and a seven gene multi-locus sequence typing scheme have been used to establish taxonomic relationships within species of the Bcsl-group [35, 36]. Comparative genomics of the cry-gene loci has revealed remarkable proximity to elements of genome plasticity such as plasmids, transposons, insertion elements and prophages [2, 37,38,39]. The activity of these mobile elements has resulted in a magnitude of highly diverse plasmid sizes through rearrangements such as deletions and insertions, as well as migration of cry-genes into the bacterial chromosome [40]. The worldwide distribution of B. thuringiensis and its capacity to adapt to a diverse spectrum of invertebrate hosts is explained by the formation of spores and a remarkable variability in crystal protein families [13]. This toxin arsenal, especially the copy number of individual toxin genes, can be shaped by reciprocal co-adaptation with a nematode host, as previously demonstrated using controlled evolution experiments in the laboratory [41, 42]. The B. thuringiensis strain MYBT18246 described herein and its host Caenorhabditis elegans have been selected as a model system for such co-evolution experiments [41]. One aim of this sequencing project was to provide a high-quality reference genome sequence for the original B. thuringiensis MYBT18246 in order to obtain a detailed phylogeny and shed light on the evolution of this microparasite, with a particular focus on the presence of virulence factors, elements of genome plasticity and host adaptation factors. Here we present the genome of the nematicidal B. thuringiensis MYBT18246 and its comparative analysis to the three closest relatives identified by MLST phylogeny.

Organism information

Classification and features

Bacillus thuringiensis belongs to the genus Bacillus and has been isolated in the end of the nineteenth century [17, 20] and used as a biocontrol agent for several decades [7, 18, 21]. The strain B. thuringiensis MYBT18246 is a Gram-positive, rod-shaped and spore forming bacterium (Fig.1a), as most B. thuringiensis [7]. Bacillus thuringiensis MYBT18246 was isolated in the Schulenburg lab by AS from a mixture of genotypes present in the strain NRRL B-18246, originally provided by the Agricultural Research Service Patent Culture Collection (United States Department of Agriculture, Peoria, IL, USA) [43,44,45]. As a member of the species B. thuringiensis , B. thuringiensis MYBT18246 is facultative anaerobe, motile and is able to produce parasporal crystal toxins, which is the characteristic feature of this species [2]. Growth occurred at temperatures ranging from 10 to 48 °C and optimal growth was monitored at mesophil temperatures ranging from 28 to 37 °C [46]. The pH range of B. thuringiensis strains varies from pH 4.9 to 8.0, with the optimum documented as pH 7 [47, 48]. Strain B. thuringiensis MYBT18246 exhibits flat, opaque colonies with undulate, curled margins and produced crystals during the stationary phase (Fig. 1a-b). Characteristic features of B. thuringiensis MYBT18246 are listed in Table 1.

Fig. 1
figure 1

Microscopic characteristics of Bacillus thuringiensis MYBT18246. a Light microscope analysis of Gram stained B. thuringiensis MYBT18246 cells (40×). b Phase contrast microscope analysis of sporulated and Cry-toxin producing cells of B. thuringiensis MYBT18246 (40×)

Table 1 Classification and general features of B. thuringiensis MYBT18246 [54]

Extended feature descriptions

The cell size of Bacillus thuringiensis can vary from 0.5 × 1.2 μm - 2.5 × 10 μm [11]. Categorization into the group of Gram-positive organisms was confirmed by Gram staining, as shown in Fig. 1a. In Fig. 1b the production of Cry toxins can be observed. These toxins accumulate during the sporulation phase next to the endospore and build phase-bright inclusions [7]. Bacillus thuringiensis MYBT18246 exhibited 99% 16S rRNA sequence identity to other published Bcsl-members [49]. As a result of the high sequence similarity, a phylogenetic differentiation of B. thuringiensis MYBT18246 based on 16S phylogenetic differentiation of Bcsl group members is impossible (Fig. 2a). As an alternative, 23 B. thuringiensis strains, and a representative of each of the Bcsl group species were chosen for phylogenetic analysis using multi-locus sequence typing as previously developed by Priest [36] (Fig. 2b). Bacillus subtilis subsp. subtilis str. 168 was selected as an outgroup to root the tree [17, 18]. The phylogenies were generated using the Neighbor-Joining method [50] and evolutionary distances were computed by the Maximum Composite Likelihood method [51]. In total, 217 MLST gene sequences were compared with 1000 bootstrap replicates. Phylogenetic analysis was conducted in MEGA7 [52]. All used reference sequences were retrieved from GenBank hosted at NCBI.

Fig. 2
figure 2

Phylogenetic tree highlighting the taxonomic relation of B. thuringiensis MYBT18246 (red) based on a) 16rDNA amplicon within the Bacillus clade b) Multi-locus sequence typing within the Bacillus cereus sensu lato species group. GenBank accession numbers are given in parentheses. Comparison includes strains of the Bacilli clade or Bcsl group members (blue). Paenibacillus larvae subsp. larvae DSM 25430 or Bacillus subtilis subsp. subtilis str. 168 has been used as outlier to root the tree. Sequences were aligned using ClustalW 1.6 [91, 92]. The phylogenetic tree was constructed by using the Neighbor-Joining method [50] and evolutionary distances were computed by the Maximum Composite Likelihood method [51] within MEGA7.0 [52]. Numbers at the nodes are bootstrap values calculated from 1000 replicates

Genome sequencing information

Genome project history

Bacillus thuringiensis MYBT18246 was used in a co-evolution study with a Caenorhabditis elegans host. The original strain MYBT18246 was selected for sequencing in order to generate a reliable reference sequence for subsequent experiments [41, 42]. The genome sequence was analyzed to identify virulence factors and fitness factors contributing to the efficient infection of C. elegans. Additionally, the phylogenetic position of B. thuringiensis MYBT18246 in the Bcsl group was determined [41]. The complete genome sequence has been deposited in GenBank with the accession numbers (CP015350-CP015361) and in the integrated Microbial Genomes database with the Taxon ID 2671180122 [53]. A summary of the project information and its association with MIGS version 2.0 compliance [54] is shown in Table 2.

Table 2 Project information

Growth conditions and genomic DNA preparation

Genomic DNA was isolated from B. thuringiensis MYBT18246 using the DNeasy blood and tissue kit (Qiagen, Hilden, Germany) for 454 pyrosequencing [55] and the Genomic-Tip 100/G Kit (Qiagen, Hilden, Germany) for Single Molecule real-time sequencing [56] according to the manufacturer’s instructions. For SMRT-sequencing the procedure and Checklist: Greater than 10 kb Template Preparation Using AmPure PB Beads was used and blunt end ligation was applied overnight. Whole-genome sequencing was performed using a 454 GS-FLX system (Titanium GS70 chemistry; Roche Life Science, Mannheim, Germany) and on one SMRT Cell on the PacBio RSII system using P6-chemistry (Pacific Biosciences, Menlo Park, CA, USA).

Genome sequencing and assembly

A summary of the project information can be found in Table 2. 454-pyrosequencing was carried out at the Institute of Clinical Molecular Biology in Kiel, Germany and SMRT-sequencing at the DSMZ Braunschweig. First, approximately 331,000,454-reads with an average length of 600 bp were assembled using the Newbler 2.8 de novo assembler (Roche Diagnostics), resulting in 729 contigs with a coverage of 18 x. Repeats were resolved and gaps between contigs were closed using PCR with Sanger sequencing of the products with BigDye 3.0 chemistry and an ABI3730XL capillary sequencer (Applied Biosystems, Life Technology GmbH, Darmstadt, Germany). Manually editing in Gap4 (version 4.11) software of the Staden package [57] was performed to improve the sequence quality. For final gap closure PacBio sequencing was used. A total of 27,870 PacBio reads with a mean length of 14,053 bp were assembled using HGAP 2.0 [58], resulting in a coverage of 50 x, with further analysis using SMRT Portal (v2.3.0) [59]. Finally, both assemblies were combined, resulting in 12 contigs including a closed circular chromosome sequence of 5,867,749 bp. Eight additional contigs exhibited overlapping ends and were circularized to plasmid sequences ranging from 6.3 kb to 150 kb (Table 3). The assembly was checked for coverage drop downs and extremes of disparities including GC, AT, RY, and MK. Moreover, we determined the origin of replication of B. thuringiensis MYBT18246 by comparative analysis with OriC of eight other B. thuringiensis strains available in DoriC [60, 61]. These strains varied in chromosome size from 5.2 Mb to 5.8 Mb but all shared a similar GC-content of 35%. In total, including B. thuringiensis MYBT18246, two OriC regions were identified using the ORF-Finder [62]. One region was highly conserved with regard to OriC length (178/179 nt), OriC AT content (~0.69) and number of DnaA boxes (4). The second region varied in OriC length (564–767 nt) and OriC AT content (~0.67–0.7), but all had the same number of DnaA boxes (9). B. thuringiensis MYBT18246 showed the highest OriC similarities with both OriC regions of B. thuringiensis Bt407.

Table 3 Summary of genome: one chromosome and 11 plasmids

Genome annotation

Annotation was performed with Prokka v1.9 [63] using the manually curated Bacillus thuringiensis strain Bt407 [64] as a species reference and a comprehensive toxin protein database (including Cry, Cyt, Vip, and Sip toxins) as feature references. The Prokka pipeline was applied using prodigal for gene calling [65]. RNAmmer 1.2 [66] and Aragorn [67] were used for rRNA gene and t-RNA identification, respectively. Additionally, signal leader peptides were identified with SignalP 4.0 [68] and non-coding RNAs with an Infernal 1.1 search against the Rfam database [69]. Annotation of cry toxin genes were manually corrected and named according to the standards of the Cry toxin nomenclature by Crickmore [70]. Identified toxins were deposited at the Bacillus thuringiensis Toxin nomenclature database [14].

Genome properties

The genome of B. thuringiensis MYBT18246 consists of 12 replicons with a circular chromosome of 5,867,749 bp (Table 3). The GC content of the chromosome is 35% and the GC content of the plasmids ranges from 32 to 37%. The total number of protein coding genes is 7089 with 6092 genes on the chromosome and 997 genes on the plasmids. The genome harbors 12 rRNA clusters, 111 t-RNA genes, 5274 predicted protein-coding genes with assigned function and 1815 genes encoding proteins with unknown function (Table 4). All gene products have been assigned to COGs (Table 5). The genome sequence of B. thuringiensis MYBT18246 is available in GenBank (CP015350 for the chromosome and CP015351 - CP015361 for the plasmids).

Table 4 Genome statistics
Table 5 Number of protein encoding genes associated with general COG functional categories

Insights from the genome sequence

To investigate the phylogeny of B. thuringiensis MYBT18246 two approaches were used. First, nineteen Bacillus strains were chosen for 16S rRNA analysis within the Bacillus clade (Fig. 2a). The 16S rRNA phylogeny shows that B. thuringiensis MYBT18246 clusters with other Bcsl group members within the Bacillus clade. However, the low bootstrap values confirm the limitations of 16S rRNA as a discriminatory marker within the Bcsl species group. Second, we applied an MLST approach based on the scheme by Priest et al. [36]. This revealed that MYBT18246 clusters with the toxin cured B. thuringiensis Bt407, insecticidal B. thuringiensis serovar chinensis CT-43, and with the nematicidal B. thuringiensis YBT-1518 within the Bcsl phylogeny (Fig. 2b). Based on this phylogeny and the phenotypic defining feature of the B. thuringiensis species group (the ability to produce crystal toxins against invertebrates and nematodes), the strain B. thuringiensis MYBT18246 can be safely classified as nematicidal B. thuringiensis .

For a detailed analysis of encoded toxins in B. thuringiensis MYBT18246, we generated a local database consisting of all available Cry, Cyt, Vip and Sip protein sequences from UniProtKB [71] and GenBank [72]. The database was curated to generate a set of non-redundant reference toxins. In total, we identified three different cry toxin genes in the B. thuringiensis MYBT18246 genome and classified them as cry13Aa2 (>95%), cry13Ba1 (<78%) and cry13Ab1 (<95%), based on the similarity scheme from the Cry-toxin naming committee by Crickmore [13, 70]. Notably, these cry toxin genes are encoded on the chromosome and not on extra-chromosomal elements as has been previously reported for the vast majority of cry toxin genes [7, 73, 74]. The toxin gene analysis revealed four additional putative toxin-like genes on plasmids with sequence similarity to cry genes and vip genes. A Pfam domain analysis using InterPro [75] revealed a p120510 encoded putative Vip-like toxin, a p120416 encoded putative Cry-like toxin and two p109822 encoded putative Bin-like toxins with potential for future studies.

Additionally, the B. thuringiensis MYBT18246 chromosome was screened for prophage regions by using the Phage Search Tool with default parameters. PHAST identifies prophage regions based on key genes from a reference database and defines the boundaries using a genomic composition-based algorithm. For a more detailed description see [76]. A total of 16 putative prophage loci were identified in the chromosome, including three that were associated with the previously identified chromosomally encoded cry toxin genes. As shown in Fig. 3, the cry toxins (displayed in red, track 4) are located in close proximity to identified prophage regions (displayed in blue, track 3). Furthermore, all B. thuringiensis MYBT18246 extra-chromosomal elements were also screened for prophages to check whether we could identify phages that reside in a linear or circular state in the host, as has been reported in 2013 by Fortier et al. [77]. Apparently, intact phage regions were identified according to the PHAST score system on p150790, p120416, p109822, p101287 and p46701.

Fig. 3
figure 3

Circular visualization of the genome comparison of B. thuringiensis MYBT18246 with 3 other sequenced B. thuringiensis strains. The tracks from the outside represent: (track 1–2) Genes encoded by the leading and lagging strand of B. thuringiensis MYBT18246 marked in COG colors [93]; (track 3) putative prophage regions, identified with PHAST in blue [76], (track 4) identified cry toxin genes in red; (track 5–7) orthologs for the genomes of B. thuringiensis YBT-1518 (CP005935.1), B. thuringiensis CT-43 (CP001907.1), B. thuringiensis Bt407 (CP003889.1) illustrated in red to light yellow, singletons in grey (grey: <1e−20; light yellow: 1e−21–1e−50; gold: 1e−51–1e−90; light orange: 1e−91–1e−100; orange: 1e−101–1e−120; red: >1e−121 (track 7) % GC plot (track 8), GC skew [(GC)/(G + C)]. Visualization was done with DNAPlotter [94]

The finding of prophage associated cry genes in strain MYBT18246 indicates that phages may serve as vectors for the transmission of virulence factors within the species B. thuringiensis . This resembles the previously described lysogenic conversion of pathogens by phages [78], supporting the idea that phages may represent a driving force for the distribution of fitness factors as well as virulence factors [78,79,80]. The finding that toxins, which are generally specific for a certain type of host organism, are located within a mobile genomic element in the chromosome of this bacterium, suggests that phages of strain MYBT18246 may contribute to adaptation to different hosts [81,82,83].

Extended insights

Based on the proximity within the tree (Fig. 2b), the genomes of B. thuringiensis Bt407, B. thuringiensis serovar chinensis CT-43 and B. thuringiensis YBT-1518 were identified as closest relatives and selected for an in depth comparative analysis. Shared gene contents were determined, visualized and compared, with a focus on known virulence factors such as cry toxins and pathogenic driving forces such as phages. The analysis revealed unique as well as shared gene contents for each strain (Fig. 3). In Fig. 3 the outer rings represent the genes on the leading and lagging strand with COG classification. The inner rings (track 5–7) illustrate the orthologous genes of B. thuringiensis YBT-1518, B. thuringiensis CT-43, B. thuringiensis Bt407 in red (high similarity) to light yellow (low similarity), and white (no similarity). The circular representation of the chromosome comparison revealed that prophages are a major source of regional differences between the strains (Fig. 3). Additionally, the pan-genome of B. thuringiensis MYBT18246 compared to the three closest relatives was determined (Fig. 4). Orthologous genes between all four organisms were identified by comparing the whole genomes using Proteinortho [84] with a similarity cutoff of 50% and an E-value of 1e−10. Gbk-files were downloaded from NCBI and the protein sequences were extracted using cds_extractor v0.7.1 [85]. Detected paralogous genes are displayed in the Venn diagram in Fig. 4. All four strains share a core genome of 4298 genes. This is equivalent to 67% of each genome. Bacillus thuringiensis MYBT18246 shares 4 additional genes exclusively with B. thuringiensis Bt407, 17 genes with B. thuringiensis serovar chinensis CT-43 and 327 genes with B. thuringiensis YBT-1518. Bacillus thuringiensis serovar chinensis CT-43 and B. thuringiensis Bt407 share 398 orthologous genes. Notably, the genome of B. thuringiensis MYBT18246 contains 1242 orphan genes and thus two to threefold more singletons than the compared genomes. This result confirms the high degree of conservation of the four Bacillus thuringiensis strains (Fig. 2a and b) and it also refines the phylogenetic relationship of the strains to each other based on non-orthologous regions. Singletons are located on the chromosome as well as on extra-chromosomal elements. The density of singletons is higher (2.5 fold) on the plasmids. Notably, all major chromosomal differences can be attributed to prophage regions. All gene products were assigned to COG categories and investigated for PFAM domains and Signal peptides (Table 6). In detail, those genes code for: (i) phage proteins, (ii) morons (virulence factors), (iii) a vast majority of proteins with cryptic function. This is supported by Fig. 3 which clearly shows that the regions of differences (track 5–7) directly correspond to the regions of identified phages (track 3). Moreover, the identified cry toxins (track 4) are adjacent to identified prophage regions and could be suggested as morons. Additionally, the singletons were screened for further virulence factors and genes encoding type-IV secretion system, C5-methyltransferase, type-restriction enzymes, sporulation, resistance and genes involved in genetic competence were identified. In particular, the finding of restriction-modification systems indicates a protection mechanism against other phages and plasmids and thus forms a putative barrier against further genomic modification.

Fig. 4
figure 4

Venn diagram of the genome comparison of B. thuringiensis MYBT18246 with other B. thuringiensis strains. Venn diagram displays the orthologous genes between B. thuringiensis MYBT18246 (CP015350-CP015361), B. thuringiensis YBT-1518 (CP005935-CP002486), B. thuringiensis serovar chinensis CT-43 (CP001907-CP001917) and B. thuringiensis Bt407 (CP003889-CP003898). Ortholog detection was performed with Proteinortho [84] including protein blast with a similarity cut-off of (50%) and an E-value of 1e−10. The total number of genes and paralogs are depicted under the corresponding species name. Open reading frames that were classified as pseudogenes were not included in this analysis

Table 6 General genome features of B. thuringiensis MYBT18246 and close relatives

Conclusion

In this work we present the whole-genome sequence of B. thuringiensis MYBT18246 and its specific genome features. The genome includes three nematicidal cry13 gene variants located on the chromosome, which were named according to sequence similarity as stated by the Cry Toxin Nomenclature Committee, as cry13Aa2, cry13Ba1, and cry13Ab1. Four additional putative toxin genes were identified with low sequence similarity to other known toxins on plasmids: p120510 (Vip-like toxin), p120416 (Cry-like toxin) and p109822 (two Bin-like toxins). These toxins contained complete toxin domains, yet the activity against potential hosts should be elucidated in future studies. The genome comprises a large number of mobile elements involved in genome plasticity including eleven plasmids and sixteen chromosomal prophages. Both plasmids and prophages are important HGT elements indicating that they are an important driving force for the evolution of pathogens. The most striking finding is the close proximity of the chromosomal nematicidal cry toxin genes to three distinct prophages indicating a contribution of phages in defining the host range of this strain. B. thuringiensis MYBT18246 may show potential as a biocontrol agent against nematodes which should be addressed in future experiments.