The members of the genus Bacillus comprise the low G+C containing Gram-positive bacteria and belong to the family Bacillaceae, order Bacillales, class Bacilli, phylum Firmicutes, domain Bacteria. They grow in a diverse range of habitats and include species that grow at extreme temperatures, salinities and pH. The phenotypic traits and physiologies of members of the genus Bacillus are as dynamic and wide ranging as their habitats and include species that grow as heterotrophs, autotrophs, in the presence or absence of oxygen and yet others grow in the presence of alternate electron acceptors such as iron, arsenate (Kanso et al. 2002). The extreme metabolic diversity, their non-pathogenic nature and the ease of isolating, handling and maintainance has opened them to biotechnological exploitation and in particular for use in agriculture, e.g. crop protection (Gao et al. 2015; Hussein et al. 2015; Ji et al. 2008). 16S rRNA-based taxonomy is routinely used to build taxonomic relationship amongst members of the domain Bacteria but it is well established that this technique fails to resolve closely related Bacillus species in which evolutionary divergence is limited. Given the lack of discriminatory power of 16S rRNA genes, a polyphasic taxonomic approach which combines 16S rRNA sequence analysis with phenotypic traits and DNA–DNA homology has been recommended to improve resolution. Examples of such an approach include the members of the Bacillus cereus sensu lato group which currently comprises of 6 species (B. cereus, B. anthracis, B. thuringiensis, B. mycoides and B. psedomycoides and B. weihenstephanensis (Zwick et al. 2012) and the B. subtilis group which comprises of 3 species B. subtilis, B. vallismortis and B. mojavensis (Roberts et al. 1994, 1996). However, the topology of polyphasic trees is not always robust and can differ from the 16S rRNA based trees and therefore delineating species and strain boundaries using a polyphasic approach can be confusing. With the advent of the high throughput cost effective Next Generation Sequencing (NGS) technologies, estimation of overall similarity of microbial genomes by Genome-to-Genome Distance Comparison (GGDC), Average Nucleotide Identity (ANI) and genome phylogeny are being considered not only for delineating closely related species but also for taxonomic assignment of new isolates (Maughan and Van der Auwera 2011; Yi et al. 2014). Here we describe the construction of a molecular phylogenetic framework for Bacillus subtilis using genome sequences and its application to strain D7XPN1, an isolate from Baku Baku King™, a commercial food-waste degrading bioreactor (Adelskov 2013; Adelskov and Patel 2014).

Materials and methods

Bioreactor operation and sample collection

Biodegradation of food-wastes was carried out using Baku Baku King™ (model: M.I.G.0100), a food waste bioreactor. The biodegradation process was started in the bioreactor (day 0) by mixing 50 kg of municipal food waste sourced from a local hotel (Novotel Hotel, Gold Coast) with 200 kg of methyl bromide treated Japanese Larch wood chips and a microbial starter seed culture. The starter microbial seed culture was prepared by mixing 10 kg of leaf litter (collected from the Forest reserve located at Griffith University Nathan campus, Brisbane, Australia), with nutrients (sugar, honey, milk, meat, fish) followed by incubation for one week at room temperature (25–30 °C) before inoculation into the bioreactor. The bioreactor was subsequently fed 50 kg of municipal food waste per day sourced from the same hotel over a 49 day period. Samples (approximately 500 g) were collected from the bioreactor every 7–10 days of its normal operation cycle of 49 days including day 0 (the day of inoculation of the bioreactor with a starter seed culture). In total, seven samples were collected (days 0, 1, 7, 17, 21, 29 and 49). The temperature and aeration in the bioreactor was controlled by the inbuilt bioreactor’s electronic system.

Enrichment, isolation and phylogeny

1 g of bioreactor samples was resuspended in 10 ml sterile dH2O the solution shaken for 2 min and the debris allowed to settled for 5 min. 200 µl of settled suspension was spread on dTSA agar medium [dTSA consisted of 0.1 % (w/v) Tryptic Soy Broth, 1.5 % (w/v) bacteriological agar, pH 7.2]. The temperature of the bioreactor during most of the operation period was between 40 and 45 °C and hence the plates were incubated at 45 °C until colonies developed. Single well-isolated colonies that appeared morphologically distinct were picked and resuspended in sterile dH2O and a loopful streaked onto dTSA agar plates and incubated at 45 °C until colonies developed. This procedure was repeated several times before the isolates were considered to be pure. 28 pure cultures (Table 1) were obtained using this process and were stored at 4 °C and −20 °C. All isolates were routinely cultured using 0.1 % (w/v) (dilute) Tryptic Soy Broth (TSB), pH 7.2.

Table 1 List of strains isolated from the waste-food degrading bioreactor, Baku Baku™

The 28 isolates were cultured in TSB (pH 7.0) at 45 °C for 18 h, the cells centrifuged and the DNA from the pelleted cells purified using a modification of Marmur’s method as described by Ogg and Patel (2009). In brief, bacterial cells were resuspended in a buffered solution (50 mM Tris, 10 mM EDTA, pH 7.8) and treated with 0.8 mg/ml lysozyme, 0.3 mg/ml Achromopeptidase and 0.1 mg/ml RNAse A and subsequently lysed by adding 0.12 mg/ml Proteinase K and 6 mg/ml sodium dodecyl sulphate (SDS) to the suspension. The DNA from the lysate was purified using phenol:chloroform extraction. Purified DNA quality was assessed by agarose gel electrophoresis and DNA concentration determined fluorometrically using a Qubit™ dsDNA HS assay kit as described by the manufacturer (Life Technologies, USA). The 16S rRNA gene was amplified from the DNA of the isolate by PCR using the universal forward primer Fd1 (AGAGTTTGATCCTGGCTCAG) and reverse primer Rd1 (AAGGAGGTGATCCAGCC) that bind to the 8–27 and 1512–1493 base pair positions of E. coli numbering scheme according to Winkler and Woese (1991). Reactions of 50 µl volume consisted of: 0.2 mMdNTP, 2 mM MgCl2, 1 mM Fd1, 1 mM Rd1, 0.5–5 ng of DNA template, 2.5 U of Taq polymerase (Mango Taq) and provided reaction buffer. PCR proceeded using a Corbett Research FTS-1 Thermal sequencer with the following cycle program: cycle 1; 2 min 95 °C; 1 min 50 °C; 2 min 70 °C, cycle 2–32; 55 s 94 °C, 1 min 50 °C, 2 min 72 °C. The reaction amplicon was purified by either SureClean™ or Gel extraction (QIAGEN) following manufacturer’s instructions. The purified amplicons were sequenced on an ABI 3730xl 96-capillary sequencer using Fd1 and Rd1 primers at AGRF (Australian Genetics Research Facility). 16S rRNA gene sequence manipulation and phylogenetic analysis was performed as described previously (Redburn and Patel 1994).

All isolates were screened for the presence of amylase activity by inoculating a loopful of culture onto dTSA agar medium supplemented with 1 % soluble potato starch (Chem Supply, Australia) and incubation at 45 °C until colonies developed. The plates were flooded with Gram’s Iodine solution and a positive reaction for amylase production recorded for isolates when there was a zone of clearance around colonies against a red–purple background. Isolates were screened for xylanase activity by streaking X-xyl Agar plates followed by incubation at 45 °C. X-xyl Agar plates contained (g−L distilled water): Tryptic Soy Broth (Oxoid, USA) 1 g, xylan from birchwood (Sigma, USA) 3 g, Bacteriological Agar (Oxoid, USA) 15 g, X-β-D-xyloside (Gold Biotech, USA) 0.2 g, pH 7.2 and incubated at 45 °C. Xylanase production was recorded as positive when colonies showed a blue color.

One of the isolates designated D7XPN1, which produced a xylanase and amylase, was selected for further studies and is described in more detail in this paper.

Characterization of strain D7XPN1

Temperature, pH and salinity growth studies were conducted in 18 mm glass culture tubes containing 15 ml of modified Luria Bertoni Broth (mLBB). mLBB contained per litre Luria Bertoni Broth (Oxoid, USA) 25 g, d-Glucose anhydrous (Lab Supply, Australia) 3 g. For temperature studies, mLLB was inoculated with 0.2 ml of an overnight culture and incubated in water baths maintained at 37, 45, 50, 60, 70 °C and incubated for 48 h. For pH studies, the pH of mLBB was changed to the desired pH (pH range of 4.0–10) by addition of 1 M HCl or 1 M NaOH, inoculated with 0.2 ml of an overnight culture and incubated in water baths maintained at 45 °C. For salinity studies, appropriate amounts of NaCl was weighed and added to mLBB medium to achieve the desired concentration of salinity (3–7 %), inoculated with 0.2 ml of an overnight culture and incubated in a water bath maintained at 45 °C for 48 h. Following incubation, growth was determined by inserting the glass culture tubes directly into a modified cuvette holder of a Novaspec LKB spectrophotometer and the absorbance measured at 600 nm. Anaerobic growth was tested in Trypticase, Yeast Extract, Glucose (TYEG) medium as described previously (Ogg and Patel 2009).

Dataset and genome sequencing

Unless indicated otherwise, all computational analysis was performed using a 16 CPU Dell workstation with 64 gigabytes RAM and an Intel® Xeon(R) CPU X5570 @ 2.93 GHz × 8 chipset running Ubuntu 12.04 and the Australian Government Information Technology Infrastructure Facilities accessed under the National eResearch Collaboration Tools and Resources (NeCTAR) program.

Complete and draft Whole Genome Sequences (WGS) for all strains that were identified in GenBank microbial genome database as Bacillus subtilis, B. amyloliquefaciens and B. atrophaeus were downloaded from the NCBI ftp server (release 204) (Table 2). Unless indicated otherwise, sequence contigs from all GenBank files were extracted and converted to fasta format (Vesth et al. 2013) and any plasmid sequences accompanying the genome data (B. subtilis subsp. natto str. BEST195, B. subtilis subsp. str. NCIB 3610 and B. subtilis subsp. subtilis str. B7-s) were removed before use in comparative genomic studies.

Table 2 Bacillus subtilis strains used in determining the Overall Genome Relatedness Indices (OGRI) using Genome-to-Genome Distance Calculations (GGDC) and Average Nucleotide Identity by Blast (ANIb) and phylogenomics
Table 3 Genomes misclassified as Bacillus subtilis based on ANIb analysis

Library construction and sequencing of the genome of strain D7XPN1 was performed at the Australian Genome Research Facility (AGRF) core facility on an Ion Torrent PGM sequencer using a 318 chip. The sequencing data was converted to FASTQ format and adapters were removed from individual reads. The quality of the sequencing data was assessed using PRINSEQ ( (Schmieder and Edwards 2011). Genomic contigs were assembled from reads using the GS de novo Assembler (Newbler) software ( The assembled draft genome was annotated using Prokka, version 1.10 (Seemann 2014) and the RAST automated annotation pipeline server ( (Aziz et al. 2008), which employs subsystems technology to identify genes related to different categories of cellular processes and metabolism (Overbeek et al. 2014). The whole-genome shotgun project of Bacillus subtilis strain D7XPN1 (= KCTC 33554, JCM 30051) has been deposited at DDBJ/EMBL/GenBank under the accession number JHCA00000000. The version described in this paper is version JHCA00000000.1.

Estimation of overall genome relatedness indices (OGRI) for strain D7XPN1

OGRI methods depend on comparisons of whole genome rather than single genes or a set of genes and ANIb has been established as a method of choice. ANIb was performed with genome nucleotide sequences using the script ( that incorporates the ANI algorithm by Richter and Rossello-Mora (2009). In addition, intergenomic e-DDH distances were calculated using Genome-to-Genome Distance Calculator (GGDC) with the recommended formula as described by Meier-Kolthoff et al. (2013).


The use of groups of orthologous proteins or a set of common conserved genes across genomes is required for phylogenomic analysis and several methods are available for such studies. In our study, we first used Prokka (version 1.10) (Seemann 2014) to identify open reading frames (ORF) of genome nucleotide sequences (n = 43), which were subsequently translated into putative protein sequences and annotated. We then created two protein datasets for phylogenomic studies. For the first, we used the hal pipeline with default settings to find and extract all protein orthologs (Robbertse et al. 2011) and for the second, we used CD-Hit (Huang et al. 2010) to select for a smaller but highly conserved set of orthologuous proteins from the annotations (>97 % AA identity). Additionally, we used the 6 conserved MLST gene dataset (glpF, pta, purH, pycA, rpoD and tpiA) typically used for typing strains of Bacillus subtilis for phylogenomic Multilocus Sequence Analysis (MLSA). The 6 genes were downloaded from pubMLST database and used in local BLAST queries against the Bacillus subtilis genome database of open reading frames (ORF) and the MLST genes retrieved. The datasets of the conserved protein sequences from hal and CD-Hit and the MLST nucleotide sequences were aligned separately using CLUSTALW ( (Larkin et al. 2007), concatenated into a single super-alignment and used to construct Maximum Likelihood trees using PhyML (Guindon et al. 2010).


Strain isolation and phylogenetic identification

28 strains were isolated from bioreactor samples collected over 49 days of operation. The strains were selected on the basis of differences in colony morphology, growth rates and enzyme screening (amylase and xylanase) (Table 1). 16S rRNA gene (sequence length between 560 and 1555 bp) analysis revealed that 27 of the 28 isolates were members of the genera Bacillus, Paenibacillus, Kurthia and Aneurinibacillus, phylum Firmicutes, domain Bacteria (Fig. 1) whereas the 28th isolate, identified using 18S rRNA gene sequence analysis (sequence length 1738 bp) analysis as Ogataea polymorpha, a yeast (data not shown). Further phylogenetic analysis showed that 9 of the isolates cultured from samples taken from the bioreactor on days 0, 1, 7, 17, 21, 29, and 49 of its operation cycle (Table 1), and which included strain D7XPN1, were closely related to Bacillus subtilis (99 % similarity) (Fig. 1).

Fig. 1
figure 1

Phylogenetic distance tree constructed from partial 16S rRNA sequences (467 nucleotides) of bacterial strains isolated from samples retrieved from Baku Baku™ a food waste bioreactor. Distance estimation was obtained using the Jukes and Cantor model. Bootstrap percentages after 1000 replications are shown. Scale bar represents one nucleotide change in every 100 nucleotides. Refer to Table 2 for the type strains in this figure

Phenotypic characterization of strain D7XPN1

The cells of strain D7XPN1 were short rods which stained Gram positive and produced cream coloured, opaque, raised, irregular-shaped colonies on dTSA medium. Strain D7XPN1 grew optimally at 45 °C (growth temperature between 24 and 50 °C) suggesting that it was a thermotolerant/moderate thermophile and pH 7 (pH growth range between pH 5 and pH 9). The strain tolerated up to 7 % NaCl (the highest tested) and grew anaerobically by fermentation in the absence of oxygen.

Genome studies of strain D7XPN1

A total 722,222 reads with a mean read length of 196.13 bp (total of 141,651,194 bp) were produced using IonTorrent™. The assembly of these reads with GS assembler (Newbler) produced 28 genomic contigs (average coverage of 40x) with contig sizes ranging from 1,017,528 to 510 bp in length and a N50 of 504,008 bp. RAST server identified a total of 5116 genomic features that included 69 RNA and 5047 protein coding sequences (Fig. 2). Of the 5047 total features, 2,320 were placed into functional subsystems. Two subsystems categories related to carbohydrates and amino acid and derivatives had the highest number of associated features with 615 and 506 coding features, respectively.

Fig. 2
figure 2

a RAST annotation gene functional categories of Bacillus subtilis str. D7XPN1. a The pie graph of the total numbers of CDS is divided into 3 categories (in %)—CDS annotations that are represented in the RAST subsystem, those that are not represented (non-subsystem), and those that annotations that are hypothetical. b The bar graph shows the total subsystem CDS annotations by major subsystem categories

Estimation of overall genome relatedness indices (OGRI)

49 strains which had a 16S rRNA gene sequence similarity of ≥97 % to B. subtilis strain 168, together with genomes of the members of B. subtilis sensu lato group, B. atrophaeus and strain D7XPN1 were initially used in the ANIb studies (Table 2). The study showed that 43 strains with an ANIb similarity values >92 % could be regarded as members of B. subtilis (Fig. 3) whereas the remaining 6 strains whose similarity values were <92 %, should be examined more closely. Closer examinations showed that B. subtilis strain GBO3, B. subtilis strain SPZ1 and B. subtilis str. NKYL29 shared a high ANIb similarity value (97 %) with B. amyloliquefaciens and B. subtilis subsp. niger str. PCI with B. atrophaeus (99.9 %) (Table 3). Additionally, review of the literature showed that the genome of B. subtilis strain BEST7613 was a chimeric construct of the genomes of Synechocystis strain PCC6803 and B. subtilis 168 (Watanabe et al. 2012) and the examination of the statistics of the genome of B. subtilis B7-S showed that it had a genome that was substantially different in size (5.3 Mb) and in G+C mol % content (35.1) to 43 members of the B. subtilis cluster. These 6 misclassified strains were therefore removed from further analysis.

Fig. 3
figure 3

ANIb heatmap of Bacillus subtilis genome sequences (n = 43). Individual genome-to-genome ANIb values are represented in the central bi-color gradient heatmap, the color key shown on the top left had side including adivison line for the recommended subspecies cutoff value (97 %). The heatmap is accompanied by a hierarchical clustering dendrogram with for distinct clades (clusters): B. subtilis subsp. subtilis (A, green), B. subtilis subsp. stecoris (B, pink), B. subtilis subsp. spizizenii (C, yellow) and B. subtilis subsp. inaquosorum (D, blue). Refer to Table 2 for the type strains in this figure

Fig. 4
figure 4

GGDC estimated DNA–DNA Hybridization (DDH, formula 2) of a selected set of B. subtilis genomic strains representing the three current sub-species and the proposed fourth sub-species

The remaining 43 strains could be further grouped into 4 clusters based on ANIb similarity values (Fig. 3). Cluster 1 constituted the largest group (33 strains) with a ANIb similarity value of >98 % amongst the members and is represented by the taxonomically validated B. subtilis subsp. subtilis. Of the 33 strains, 16 have already been identified as members of cluster 1. Cluster 2 consists of the two newly isolated strains JS (Song et al. 2012) and D7XPNI (Adelskov and Patel 2014), (Fig. 2) which have an ANIb similarity value of 95.6 % to cluster 1 and to each other by 98.8 %. These two strains have not been taxonomically validated previously and in this report we propose to describe these two strains as members of a new subspecies, B. subtilis subsp stecori of which strain D7XPN1T is the type sub-species. Cluster 3 is composed of two strains of the taxonomically validated B. subtilis subsp. inaquosorum (Table 1) with an ANIb similarity value of 92 and 93 % to clusters 1 and 2, respectively and to each other by 98.6 %. The remaining 6 strains belong to a loose cluster represented by the taxonomically validated B. subtilis subsp. spizizenii which have an ANIb similarity value of between 92 and 94 % with members of clusters 1, 2 and 3. Of the 6 strains, 4 strains have been correctly identified as members of this cluster (Table 2). Of the 6 strains, 4 strains (B. subtilis str. BSC154, B. subtilis subsp. spizizenii str. W23, B. subtilis subsp. spizizenii str. ATCC 6633 and B. subtilis str. BST) group closely together (ANIb value of >99 %) whereas B. subtilis subsp. subtilis str. DV1-B-1 and B. subtilis subsp. spizizenii str. TU-B-10 are more distant (ANIb values of 95.6 and 92.2 %, respectively) to the 4 strains.

DNA homology using genome-to-genome-distance-calculator (GGDC)

DNA–DNA hybridization (DDH) method is a gold standard that is used to differentiate species of the same genus when the 16S rRNA sequence similarity is >97 %. The widely accepted species boundary set by the DDH method is 70 %. The Genome-to-Genome-Distance-Calculator (GGDC) is an in silico alternate for the traditional experimental DDH method and is the second OGRI method used in our study. As all the B. subtilis strains (n = 43) have a 16S rRNA similarity value >97 %, we have calculated the GGDC similarity indices of representatives genomes from each of the 4 sub-species clusters. The results show that B. subtilis subsp. stecori strains D7XPN1 and JS of cluster 2 share a genome similarity of 88.6 % to each other and 62.2–62.9 % with strains of cluster 1 represented by B. subtilis subsp. subtilis and <51 % with strains from the clusters 3 and 4 represented by B. subtilis subsp. spizizenii and B. subtilis subsp. inaquosorum (Fig. 4).

Phylogenomic analysis

Phylogenomic trees produced from the analysis of 1724 core protein sequence orthologs (436,410 aa) and the more conserved 534 protein sequence orthologs generated from CD-Hit analysis (≥97 % amino acids similarity) are presented in Figs. 5 and 6, respectively and the Multi-Locus Sequence Analysis (MLSA) tree generated from the 6 genes routinely used in Multi-Locus Sequence Typing (MLST) of B. subtilis strains is shown in Fig. 7. All 3 trees resolve Bacillus strains (n = 43) into 4 clusters with the same topology and with 100 % bootstrap values at each branch point of the clusters and the phylogenomic studies supports the results from genome to genome comparisons studies of ANIb and DDH (Fig. 3) though there are slight changes in the topology of the internal branches of cluster 1 representing B. subtilis subsp. subtilis.

Fig. 5
figure 5

Maximum Likelihood phylogenetic tree of B. subtilis (n = 43) constructed from a superalignment of sequences from 1724 protein orthologs of B. subtilis genomes using the hal pipeline with default parameters. Distances were corrected using the Jone-Taylor (JTT) model with 1000 non-parametric bootstrap replicates. The bootstrap values are represented by numbers at nodes. Scale bar indicates 5 differences in every 1000 amino acids (0.5 %). Refer to Table 2 for the type strains in this figure

Fig. 6
figure 6

Maximum Likelihood phylogenetic tree of B. subtilis (n = 43) produced from a superalignment consisting of conserved sequences of 512 protein orthologs (>97 %). Distances were corrected using the Jones -Taylor-Thornton (JTT) model with 1000 non-parametric bootsrap replicates and the values (in percent) are represented at each node. The scale bar represents a difference of 1 in every 1000 amino acids (0.1 %). Refer to Table 2 for the type strains in this figure

Fig. 7
figure 7

Multi-locus sequence analysis (MLSA) maximum likelihood phylogenetic tree of B. subtilis produced from a superalignment of 8861 nucleotide sequences of glpF, pta, purH, pycA, rpoD and tpiA. Distances were corrected using the model of Hasegawa et al. (1985) with 1000 non-parametric bootstrap replicates. The bootstrap values are represented at each node. Scale bar represents a difference of 1 in 100 nucleotides (1 %). Refer to Table 2 for the type strains in Table 1


It is well-established that 16S rRNA phylogeny does not readily separate closely related strains of Bacillus subtilis and including differentiating phenotypic characteristics to phylogenetic data does not necessarily assist in taxonomic delineation. For example, dark pigmented colonies are a distinctive feature of B. atrophaeus but some strains of B. subtilis also produce such colonies (Nakamura 1989). Several recent studies have discussed the need to establish microbial taxonomy on the basis of information retrieved from microbial genomes especially when 16S rRNA similarity values are >97 % and many computational methods, which can be categorised into two broad groups, have been reported in the literature (Bull et al. 2012; Larsen et al. 2014). Computational methods which rely purely on comparison of nucleotide sequences of genomes, have been recently coined by Chun and Rainey (2014) as Overall Genome Relatedness Indices (OGRI) methods and include ANI, GGDC, GBDP (Genome Blast Distance Phylogeny), Maximal Unique Matches Index (MUMi) whereas methods which rely on comparison of conserved genomic features (e.g. core genes and proteins) where sequence disparities are the result of evolutionary pressures, are known as phylogenomic methods.

In this study we isolated 28 strains from a food-waste degrading commercial bioreactor of which 9 isolates could only be assigned as strains of B. subtilis based on 16S rRNA sequence analysis (Fig. 1). We selected strain D7XPN1 as representative of the 9 isolates, sequenced and using OGRI (ANIb and DDH) and phylogenomic methods compared it’s genome with genomes of B. subtilis and B. subtilis-like strains that had been retrieved from NCBI database. For this we initially downloaded 49 B. subtilis genomes but were left with only 43 after identifying and removing mixed or misclassified genomes. Of these remaining genomes of 43 B. subtilis strains, 21 strains had already been used by Yi et al. (2014) in their studies and our results support their conclusions that ANIb can be used to separate B. subtilis into 3 subspecies. They had further suggested that if the threshold ANIb value of 95–96 % were to be used for species delineation than B. subtilis subsp. spizizenii and B. subtilis subsp. inaquosorum should be designated as new species. ANIb analysis of the additional 22 strains which were not part of the studies of Yi et al. (2014) revealed that 20 could be assigned to one of the 3 clusters defined by Yi et al. (2014) but a further 2 strains, strains JS and D7XPN1, were closely related to each other (ANib value of 98.8 %) and formed a separate cluster, designated cluster 2. Cluster 2 was a sister branch of cluster 1 and was more closely related to it than to clusters 3 and 4 (ANIb values of 95.6, 93 and 92–94 %, respectively).

Based on the conservative cut off DDH value of 70 %, B. subtilis strains (n = 43) can be assigned to 4 clusters, each represented by a subspecies and is consistent with the findings of the ANIb analysis. The results also confirm that strains D7XPN1 and JS are related more closely to each other than to members of cluster 1 represented by B. subtilis subsp. subtilis and supports results from the ANIb analysis. In addition, all the 3 phylogenomic trees generated using 3 different data sets (1724, 534, and 6 core orthologs) resolve Bacillus strains (n = 43) into 4 clusters with the same topology and with 100 % bootstrap values at each branch point of the clusters and the phylogenomic studies supports the results from genome to genome comparisons studies of ANIb and DDH studies though there are slight changes in the topology of the internal branches of cluster 1 representing B. subtilis subsp. subtilis.

Phylogenomic studies indicate that there are differences in the gene and protein content of the 43 strains. Doolittle and Zhaxybayeva (2009) have hypothesised that the acquisition of genes can affect changes in an ecological niche and that even the closest relatives could be ecologically distinct ecotypes. Kopac et al. (2014) in their studies on B. subtilis subsp. spizizenii (cluster 4), which has 6 members, 4 of which were isolated from Death Valley, showed that all the genomes of the four Death Valley strains differed in gene content supporting the hypothesis of Doolittle and Zhaxybayeva (2009) that even the closest relatives could be ecologically distinct ecotypes. However, Kopac et al. (2014) were unable to demonstrate if the acquisition of genes could in fact change the metabolic dynamism of the ecological niche.

It would be interesting to extend the studies to B. subtilis subsp. subtilis (cluster 1) the most widely represented strains in B. subtilis but for the fact that the strains have been isolated from a very wide range of environments and therefore any differences found in the gene content could be considered to be biased due to habitat differences. We have in our study reported here 9 strains of B. subtilis isolated from samples taken from the bioreactor on days 0, 1, 7, 17, 21, 29, and 49 of its operation cycle (Table 1), all of which are closely related (16S rRNA similarity >99 % similarity). Strain D7XPN1 reported in our studies, was isolated from a sample collected on day 7 and is a representative of the nine B. subtilis strains. Strain D7XPN1 is capable of growing at moderate thermophilic temperatures and contains an array of enzymes for degradation of polysaccharides including a xylanase and an amylase. We intend to sequence and compare the genomes of all the 9 strains and if the hypothesis of Doolittle and Zhaxybayeva (2009) holds true then we should be able to see differences in the gene profiles which would potentially be reflective of the change in environmental conditions in the waste-degrading bioreactor, from day 0 to day 49.

The genome of strain JS isolated from the soil of a pot planted with Miscanthus sp. was sequenced and a number of genes associated with plant growth promoting and antifungal activities were identified (Song et al. 2012). In addition, the use of volatile extracts of strain JS were found to reduce the disease in bacterial infected tobacco plants (Kim et al. 2015). Our analysis of the annotations of the genome of strain D7XPN1 has also identified potential plant growth-promoting genes similar to those found in strain JS. The properties of strain D7XPN1 has potential for use as an inoculum source to improve and increase efficiency of the food-waste degradation process at thermophiic temperatures but additionally, it also has the potential for use as a plant growth promoting fertilizer at the completion of the degradation process.

Strains JS and D7XPN1 have not yet been taxonomically validated but based on the OGRI and phylogenomic results reported here, we propose to describe these two strains as members of a newly created subspecies that we designate B. subtilis subsp stecori of which strain D7XPN1T is the type sub-species. Furthermore, we propose that once more strains of clusters 2, 3 and 4 have been isolated and their genome sequences analysed than if necessary, the reassignment of the B. subtilis strains should be reconsidered given the low ANIb (≤95–96 %) and DDH values (<70 %) which demarcate each of the fourclusters.

Description of Bacillus subtilis subsp. stecoris subsp. nov

Bacillus subtilis subsp. stecoris [ gen. n. compost, from which the strain was isolated]. Grows optimally at 45 °C (range 24–50), pH of 7 (range 5–9), and grew in the presence of 7 % NaCl. Facultative anaerobe by fermentation, forms white irregular colonies 1–2 mm in diameter when grown on dTSA, cells are straight rods 4–5 µm length by 1 µm width, and stain Gram positive. Degrades potato starch by amylase activity and expresses partial β-xylanase activity detected when grown on dTSA with x-β-D-xyloside.