Comparative genomic analysis of Bacillus paralicheniformis MDJK30 with its closely related species reveals an evolutionary relationship between B. paralicheniformis and B. licheniformis
Members of the genus Bacillus are important plant growth-promoting rhizobacteria that serve as biocontrol agents. Bacillus paralicheniformis MDJK30 is a PGPR isolated from the peony rhizosphere and can suppress plant-pathogenic bacteria and fungi. To further uncover the genetic mechanism of the plant growth-promoting traits of MDJK30 and its closely related strains, we used comparative genomics to provide insights into the genetic diversity and evolutionary relationship between B. paralicheniformis and B. licheniformis.
A comparative genomics analysis based on B. paralicheniformis MDJK30 and 55 other previously reported Bacillus strains was performed. The evolutionary position of MDJK30 and the evolutionary relationship between B. paralicheniformis and B. licheniformis were evaluated by studying the phylogeny of the core genomes, a population structure analysis and ANI results. Comparative genomic analysis revealed various features of B. paralicheniformis that contribute to its commensal lifestyle in the rhizosphere, including an opening pan genome, a diversity of transport and the metabolism of the carbohydrates and amino acids. There are notable differences in the numbers and locations of the insertion sequences, prophages, genomic islands and secondary metabolic synthase operons between B. paralicheniformis and B. licheniformis. In particular, we found most gene clusters of Fengycin, Bacitracin and Lantipeptide were only present in B. paralicheniformis and were obtained by horizontal gene transfer (HGT), and these clusters may be used as genetic markers for distinguishing B. paralicheniformis and B. licheniformis.
This study reveals that MDJK30 and the other strains of lineage paralicheniformis present plant growth-promoting traits at the genetic level and can be developed and commercially formulated in agriculture as PGPR. Core genome phylogenies and population structure analysis has proven to be a powerful tool for differentiating B. paralicheniformis and B. licheniformis. Comparative genomic analyses illustrate the genetic differences between the paralicheniformis-licheniformis group with respect to rhizosphere adaptation.
KeywordsPGPR Bacillus paralicheniformis Bacillus licheniformis Comparative genomics Secondary metabolites
Average nucleotide identity
- B. licheniformis
- B. paralicheniformis
- B. sonorensis
- B. subtilis
Bayesian Analysis of Population Structure
Blast score ratio
chrome azurol S
Coding DNA sequence
Cluster of orthologous group
clustered regularly interspaced short palindromic repeats
- E. coli
General Time Reversible
horizontal gene transfer
- lineage L
- lineage P
Nonribosomal peptide synthetases
Potato Dextrose Agar
Plant growth-promting rhizobacteria
Plant growth-promoting rhizobacteria (PGPR) are a group of rhizosphere bacteria that promote plant growth . PGPR competitively colonize the plant rhizosphere and can simultaneously act as biofertilizers and as competitors of plant-pathogenic bacteria and fungi . The genus Bacillus comprise typical species of PGPR that can suppress some plant pathogens by producing antibiotics, including Bacilysin , Difficidin , Iturin A , and Surfactin . Some Bacillus strains can be used as soil inoculants in agriculture and horticulture for plant growth promotion and biocontrol. Bacterial secondary metabolism is a rich source of novel bioactive compounds with potential pharmaceutical applications as antibiotics . Antibiotic production by Bacillus spp. enhances the fitness of the production strains and suppresses plant pathogens that would otherwise harm plant health. These antibiotics mainly belong to nonribosomal peptides and polyketides, which are synthesized by nonribosomal peptide synthetases (NRPS) and polyketide synthases (PKS), respectively . NRPS and PKS are modular and composed of a series of domains including adenylation, thiolation, condensation and esterification domains .
B. paralicheniformis is a gram-positive, facultative anaerobic, motile Bacillus species . It has been reported that B. paralicheniformis is most closely related to B. licheniformis and B. sonorensis, based on phylogenetic analysis . B. paralicheniformis and B. licheniformis have been used for decades in the biotechnology industry to manufacture enzymes, antibiotics, biochemicals and consumer products [10, 11]. Some strains of these species are used to produce peptide antibiotics such as Bacitracin, Fengycin, Lichenysin and Lantipeptide [12, 13]. Thus, some isolates within the paralicheniformis-licheniformis group can mitigate the effects of fungal pathogens on field crops [14, 15]. Dunlap et al. have distinguished between B. paralicheniformis and B. licheniformis based on phylogenetic and phenotypic analyses. However, phylogenetic and genetic characteristics may be ambiguous within the paralicheniformis-licheniformis group.
B. paralicheniformis MDJK30 was recently isolated and identified from the rhizosphere of peony in Shandong, China. Its complete genome sequence has been reported . To reveal the evolutionary relationship between B. licheniformis and B. paralicheniformis, in this study, MDJK30 together with 55 other previously published Bacillus strains was comparatively studied by phylogenetic analysis, population genetic structure analysis and ANI results. Functional and comparative genomics analysis provided a better understanding of genome evolution and hereditary differences within the paralicheniformis-licheniformis group. Our results revealed genomic differences in secondary metabolic gene clusters between B. paralicheniformis and B. licheniformis, suggesting their roles in adaptation to various antibiotic stressors in the rhizosphere.
General features of B. paralicheniformis MDJK30 as PGPR
Secondary metabolism gene clusters in MDJK30
MDJK30 has the potential to synthesize bioactive secondary metabolites. Our comparative genomic analysis of MDJK30 identified 11 gene clusters (Fig. 2) related to secondary metabolites using antiSMASH . There were four gene clusters (Lichenysin, Fengycin, Bacitracin, Bacillibactin) belonging to non-ribosomal peptide synthesis. The representative gene clusters encoding NRPS and the putative product structure are summarized in Additional file 2: Figure S1A. Lichenysin is a hemolytic surfactin-like toxin with activity against gram-positive and gram-negative bacteria . Fengycin, as a cyclic lipopeptide, is reportedly to have strong antifungal activity, specifically against filamentous fungi . The iron-siderophore Bacillibactin may enhance the ability of MDJK30 to scavenge iron from the rhizosphere . The other seven secondary metabolism gene clusters include Lantipeptide, Bacteriocin, Siderophore, Terpene, Lassopepetide, T3pks and Other (unknown). These biosynthetic gene clusters are represented in Additional file 2: Figure S1B. Furthermore, we compare the homology of these gene clusters. The homologous gene clusters of MDJK30 are represented in Additional file 3: Figure S2. Notably, some gene clusters in MDJK30 show high homology to those from other species, such as B. halodurans, B. subtilis and Pontibacillus litoralis. These data suggest that the secondary metabolism gene clusters were horizontally transferred among B. paralicheniformis and other species.
Lineage L contains 37 strains identified as the species licheniformis. MDJK30 and the other 13 strains form a monophyletic lineage P and evolve from a common ancestor. Meanwhile, 6 strains of lineage P were previously identified as species licheniformis. Our phylogenetic analysis and ANI results indicated that there are mistakes in the classification of strain B4121, B4125, B4123, LMG 6934, LMG 7559 and S127, which should be corrected to species paralicheniformis. This mis-classification was also reported in other strains (B. paralicheniformis ATCC 9945a were initially identified as B. licheniformis) and have generally been corrected with advances in technology [22, 23, 24].
A total of 1718 single-copy core genes (Additional file 7: Table S4) were identified by comparison of the 14 lineage P genomes. The strains of lineage P have similar genetic characteristics, with more core genes and similar genome sizes (3.9–4.5 Mb), and the numbers of CDSs range from 4026 to 4859. We constructed a maximum likelihood tree based on the alignment of nucleotide sequences for 1718 single-copy genes shared by all lineage P strains. Furthermore, we also constructed a Neighbor-joining tree using MEGA for 56 Bacillus strains and all lineage P strains, respectively (Additional file 8: Figure S3). The core genome phylogeny of lineage P illustrates the diversity of B. paralicheniformis. The phylogenetic tree of lineage P was divided into three distinct sublineages with high bootstrap support values (Fig. 3b). MDJK30 and five other strains (NCIMB_8874, LMG_7557, ATCC_9954a, KJ-16, and S127) form a sublineage I suggest that MDJK30 is closely related to these five strains. Sublineage II contains two strains (B4125 and BL-09), and sublineage III contains 6 strains (ATCC_12759, B4121, LMG_6934, B4123, MKU3, and F47).
Comparative genomic analysis
We further used Cluster of Orthologous Group (COG) assignment  to determine into which functional category the genes of MDJK30 and core genome of lineage P fall. The number of genes assigned to each COG is present in Fig. 4b. As show in COG assignment, a higher proportion of the genes of MDJK30 were assigned to the K (9.8%, transcription), G (10.0%, carbohydrate transport and metabolism) and E (9.3%, Amino acid transport and metabolism) categories. The core genome of lineage P has a consistent functional category by COG compared with MDJK30, and as indicated, despite their geographical isolation and varied associated-plant, the majority of genes implicated in rhizosphere adaptation and competitiveness were highly conserved among the B. paralicheniformis strains.
The paralicheniformis-licheniformis group includes many strains that are widely used in biotechnology industry and field applications. Their genomes harbor many gene clusters related to secondary metabolic synthases. The availability of many genomes from the paralicheniformis-licheniformis group should permit a thorough comparison of these secondary metabolic synthases in B. paralicheniformis and B. licheniformis, thereby offering new perspectives and strategies for facilitating future applications in agriculture and industry. Genome mining identified a large number of gene clusters related to secondary metabolic synthases in the paralicheniformis-licheniformis group. To further elucidate the evolution and distribution of these gene clusters, we located and screened these gene clusters in the pan genome and identified the genomic events contributing to the differentiation of rhizosphere adaptation between B. paralicheniformis and B. licheniformis during the evolutionary process. The distribution heatmap and possible evolutionary node are represented in Fig. 5b. As shown in Fig. 5b, lineage L, lineage P and lineage 2 have common, identical biosynthesis gene clusters (Lichenysin, Bacillibactin, Bacteriocin, Siderophore, Lassopeptide, part of Terpene and Fengycin), in agreement with previous studies [10, 27]. In some secondary metabolism gene clusters, additional features such as adjacent tRNA, prophage, transposases, insertion sequence and CRISPRs are indicative of HGT (Fig. 2). Our analysis shows that most strains of lineage P contain consistent biosynthetic genes, suggesting that despite their geographical isolation and varied associated plants, the majority gene clusters implicated in rhizosphere adaptation and competitiveness were highly conserved among B. paralicheniformis strains. In addition, some strains do not contain certain gene clusters, due to a simple deletion event.
Interestingly, two non-ribosomal peptide and one specific Lantipeptide synthase genes found in lineage P but absent in lineage L are those involved in the synthesis of Bacitracin, Fengycin and paralichenicidin (Fig. 5b). The deviant GC content is used as a detection method for HGT . We detected the GC content of secondary metabolism gene clusters, which displayed an apparent deviation with their host genomes (Additional file 9: Figure S4). Five strains of lineage L also have Lantipeptides, suggesting that two additional HGT events occurred within lineage L. These gene clusters (Fengycin, Bacitracin and Lantipeptide) were performed to explore the differences in the categorical genome and rhizosphere adaptation between B. paralicheniformis and B. licheniformis.
Specific secondary metabolism gene clusters in B. paralicheniformis
Interestingly, there is an incomplete Fengycin cluster found in some strains of B. subtilis and B. amyloliquefaciens, such as B. subtilis ATCC 13952 and B. amyloliquefaciens DSM 7. In this incomplete Fengycin cluster, the biosynthetic genes for the fenA-C and part of fenD modules were absent. The strains ATCC 13952 and DSM 7 are non-plant-associated . Moreover, most strains containing the complete Fengycin cluster are plant-associated strains . Therefore, we hypothesized that gene loss events related to the biosynthetic genes of Fengycin production occurred in non-plant-associated strains. This hypothesis consists with previous studies that Fengycin prevents the growth of a wide range of plant pathogens, especially filamentous fungi .
The Bacitracin operon in B. paralicheniformis MDJK30 contains multiple genes, namely, Bacitracin synthetase (bacA: BLMD_12,970, bacB: BLMD_12,965 and bacC: BLMD_12,960), thioesterase (bacT: BLMD_12,975), those of a two-component regulatory system (bacR: BLMD_12955 and bacS: BLMD_12950), an ABC transport gene (bacE: BLMD_12945) and two other genes (bacG: BLMD_12935 and bacF: BLMD_12,940), that may not be involved in Bacitracin production. In previous studies, there was observed variation in the prevalence of Bacitracin synthase genes among B. licheniformis strains, and approximately 50% may harbor the bac operon [11, 33]. More remarkably, the bac operon is not present in the reference strain (DSM 13/ATCC 14580) genome . In this study, our comparative genomic analysis clearly illustrates the distribution of the bac operon in the paralicheniformis-licheniformis group, which indicates that this operon is present only in the strains of lineage P. These genes are also absent from all strains of lineage L. A previous study showed that erythromycin resistance of paralicheniformis-licheniformis group isolates is independent of Bacitracin production . This indicates that Bacitracin production and erythromycin resistance could be an important characteristic specific to B. paralicheniformis. and B. subtilis has been shown to produce a Bacitracin called Subpeptin . The Bacitracin synthetase of lineage P displayed a significant sequence similar to subpeptin with approximately 97–100% identity to the protein sequence (Fig. 6b). Genes for Subpeptin synthesis have been described for B. subtilis JM4 but are absent in most B. subtilis strains. It is likely that the gene cluster for Subpeptin synthesis was acquired from B. paralicheniformis through a recent single event of HGT. Furthermore, we found a putative Bacitracin operon in B. sonorensis, B. thuringiensis and B. cereus using the ClusterBlast service in antiSMASH . The Bacitracin cluster is only found in some species of the cereus and subtilis groups, and is absent in the flexus, coagulans and other groups. The phylogeny of this cluster was consistent with the Bacillus species tree, as expected under the hypothesis the Bacitracin cluster was transferred from a donor closely related to the ancestor order of cereus, coagulans and subtilis groups via an HGT event (Fig. 7). The Bacitracin synthetase of B. sonorensis, B. thuringiensis and B. cereus revealed 76–78%, 61–62% and 61–62% identity with the Bacitracin operon of B. paralicheniformis, respectively (Fig. 6b). Although the synthetase genes of the Bacitracin operons in different strains show a certain degree of divergence in sequence, the substrates of the A domains were conserved. As shown in Fig. 6b, the substrates of the A domains in BacA, BacB and BacC were determined to be ile-cys-leu-glu-ile, lys-orn and ile-phe/d-trp-his-asp-asn, respectively. More remarkably, the amino acids of the second A domain in BacC could be modified phe or d-trp, which produced two types of Bacitracin (ile-cys-leu-glu-ile-lys-orn-ile-phe-his-asp-asn and ile-cys-leu-glu-ile-lys-orn-ile-d-trp-his-asp-asn). This difference might hint at the biological differences in Bacitracin production in different strains. bacR and bacS are conserved in all Bacitracin operons of different strains, and their products make up the regulator and sensor proteins of the two-component systems. This two-component system plays a key role in the Bacitracin self-resistance of the host strains producing Bacitracin [36, 37]. bacE, bacF, bacG and bacT are absent in B. thuringiensis and B. cereus, indicating they might not be important to Bacitracin production. Furthermore, in B. cereus ATCC 10876, the Bacitracin operon is located between two types of IS elements, IS6 and IS3. It appears likely that the acquisition of this Bacitracin operon is mediated by its flanking IS elements. In general, these data suggest that the two-component systems (bacR and bacS) and three synthetase genes (bacA, bacB and bacC) make up the basic interchangeable genetic elements of the Bacitracin operon.
Lantipeptide represents a potential antimicrobial strategy against a range of pathogenic bacteria . The Lantipeptide in MDJK30 is a two-peptide Lantipeptide, named Paralichenicidin. The gene cluster of Paralichenicidin is located on both strands at the start of the genome and covers 162,067 to 179,818 bp (BLMD_00955 to BLMD_01025) (Fig. 6d). The GC content (38.6%) of the cluster is quite different from the average GC content of the whole genome (46.7%) (Additional file 9: Figure S4). Analysis of the Paralichenicidin operon identified three transport proteins (BLMD_01000: LanT1, BLMD_00990: LanT2 and BLMD_00970: LanT3), two Lantipeptide structural proteins (BLMD_01010: LanA1 and BLMD_01020: LanA2), three modification enzymes (BLMD_01005: LanM1, BLMD_01015: LanM2 and BLMD_00955: LanP) and one regulator (BLMD_01025: LanR). In comparison to Paralichenicidin and Lichenicidin, the core peptide of PicA1 and PicA2 revealed 41 and 59% identity with the LicA1 and LicA2 of lichenicidin, respectively (Fig. 6c). Interestingly, five strains of lineage L have two Lantipeptide-encoding gene clusters, namely, for Lichenicidin and Paralichenicidin (Fig. 5b). The five Paralichenicidin-encoding gene clusters in lineage L strains may thus be the result of multiple HGT from the B. paralicheniformis strains.
To investigate the putative homologs of Paralichenicidin, we performed a genome-wide examination of the LanMs associated with Paralichenicidin synthetase analogues, which involved a PSI-BLAST search of the NCBI non-redundant protein database. All results with significant E values were examined and further manually reviewed according to the gene orders and genes downstream and upstream of the LanM homologs. Of the potential homologs, three were selected for closer inspection, including B. amyloliquefaciens UMAF6639 (Amyloliquecidin, BAMY6639_13865 to BAMY6639_RS13795), Streptococcus pneumoniae SP23-BS72 (CGSSp23BS72_03568 to CGSSp23BS72_03608), and Streptomyces himastatinicus ATCC 53653 (SSOG_RS27855 to SSOG_RS27820) (Fig. 6d).
Amyloliquecidin, a known two-component Lantipeptide produced by B. amyloliquefaciens, is a recommended treatment for Staphylococcus aureus-induced skin infections . Two propeptides (AmyA1|BAMY6639_RS13800: LanA1 and AmyA2|BAMY6639_RS19790: LanA2) revealed 79 and 72% identity with PicA1 and PicA2, respectively. As with Paralichenicidin, AmyA1 and AmyA2 both contain possible leader regions, which end with GC and GA leader cleavage sites, respectively (Fig. 6d). Notably, a small ORF (BAMY6639_RS13810) near AmyA2 in B. amyloliquefaciens UMAF6639 showed 58 and 51% identity with PicA2 and AmyA2. In general, Amyloliquecidin of B. amyloliquefaciens revealed 58–82% identity with Paralichenicidin and remained consistent genes downstream and upstream of Paralichenicidin.
The other two possible relative gene clusters studied are located in S. pneumoniae and S. himastatinicus, respectively. S. pneumoniae SP23-BS72 is a clinical pneumonia-associated isolate . Concerning its putative Lantipeptide-encoding gene cluster, as with Paralichenicidin, a lanR gene (CGSSp23BS72_03608) encoding a MutR family transcriptional regulator is located upstream of the gene cluster, but no sequence identity was detected between LanR and PicR. Downstream of the gene cluster, there are two putative LanT determinants (CGSSp23BS72_ 03643: LanT1 and CGSSp23BS72_03553: LanT2) that show 39 and 43% identity with PicT1 and PicT2, respectively, and the other four hypothetical proteins show significant homology to the homologous proteins of Paralichenicidin (Fig. 6d). Moreover, the order of LanT1, LanT2 and four hypothetical proteins is also consistent with Paralichenicidin. Significantly, LanM2 has an apparent frameshift mutant (CGSSp23BS72_03633 and CGSSp23BS72_03638) and the propeptides of the two-peptide Lantipeptide are encoded by two neighboring small ORFs between LanM1 and the frameshifted-LanM2. Two propeptides (CGSSp23BS72_03623: LanA1 and CGSSp23BS72_03628: LanA2) revealed 40 and 33% identity with PicA1 and PicA2, respectively.
S. himastatinicus ATCC 53653 was isolated from soil according to the ability to produce Himastatin . With respect to its putative Lantipeptide-encoding gene cluster, a putative LanT determinant (SSOG_RS27830: LanT1, 35% identical to PicT1) as well as two putative propeptide determinants (SSOG_RS27830 and SSOG_RS27830, 37% identical to PicA1 and 36% identical to PicA2, respectively) are located in close proximity to two putative LanM1- and LanM2-encoding genes (SSOG_RS27830 and SSOG_RS27830, 26% identical to PicM1 and 26% identical to PicM2, respectively) (Fig. 6d). The order of these genes is the same as the homologous genes of Paralichenicidin. In general, the homologous relationship between these two putative Lantipeptide-encoding gene clusters and Paralichenicidin is, therefore, less likely due to recently HGT or convergent evolution. They seem to evolve from an ancient ancestor with a long time for subsequent divergent evolution.
As a result, sequence comparisons of these propeptides were performed with Paralichenicidin to determine whether this conservation extended to the putative homologous genes. The LanA1 peptides display a high level of conservation, especially in the lanthionine and methyllanthionine bridge-forming regions (Fig. 6e). The conservation reflects the shared mode of action, specifically in binding to lipid II . LanA2 displays great levels of divergence, and only shows a low degree of conservation in the N-terminal regions (Fig. 6e). The divergence of LanA2 reflects the broader role of LanA2 in membrane insertion, and the low conservation of N-terminal regions likely plays an important role in membrane insertion and pore formation .
Considering the possible important role of Paralichenicidin in the antimicrobial strategy, it was particularly interesting to determine whether the core peptide coding regions of LanA1 and LanA2 were undergoing selection pressure. Thus, the codeml program of PAMLX  was used to carry out the analysis of positive selection. Based on the LRT statistic for comparing model M1 (neutral) with model M2 (selection) and model M7 (neutral) with model M8 (selection), the core peptide coding regions of LanA1 and LanA2 were identified to be under negative selection with the ω < 1. In addition, none positively selected sites were determined. The results of positive selection indicate the LanA1 and LanA2 have been conserved during evolution.
In this study, we determined the antagonistic activity of B. paralicheniformis MDJK30 to fungi and bacteria and tested some other general characteristics as PGPR. MDJK30 showed high antagonistic activity to F. solani and B. subtilis, which indicates the potential application for controlling pathogenic fungi and gram-positive bacteria. As a member of PGPR, MDJK30 could also produce siderophores and show casein degradation activity. B. paralicheniformis species have long been known for their ability to produce secondary metabolites. Here, our comparative genomic analysis identified 11 gene clusters related to secondary metabolites in MDJK30, including Lichenysin, Fengycin, Bacitracin, Bacillibactin, Lantipeptide, Bacteriocin, Siderophore, Terpene, Lassopepetide, T3pks and others (unknown). As MDJK30 colonizes plant rhizospheres, it directly inhibits the growth of plant-pathogenic bacteria or fungus though production of antimicrobial peptides or depriving them of essential iron. MDJK30 is known to produce siderophores (Fig. 1d), and this capacity contributes to their fitness in iron-limited environments . Lantipeptide represents a potential antimicrobial strategy against a range of pathogenic bacteria . The other gene clusters might be related to the production of new antimicrobial compounds. Interestingly, the presence of a large number of mobile genetic elements and some gene clusters related to secondary metabolites in MDJK30 show high homology to those from other species, suggesting that these clusters were horizontally transferred among B. paralicheniformis and other species.
We then performed comparative genomics analysis based on B. paralicheniformis MDJK30 and 55 other previously published Bacillus strains. The phylogenetic analysis and population genetic structure based on core genomes showed that most strains fall predominantly into two distinct phylogenetic lineages designated lineages L (lineage licheniformis) and P (lineage paralicheniformis) and MDJK30, and the other 13 strains formed lineage P; lineage P and L form distinct and extremely tight clusters on separate clades from other strains (Fig. 3). Our phylogenetic analysis revealed the evolutionary position of MDJK30 and the explicit evolutionary relationship among B. licheniformis and B. paralicheniformis, suggesting that species-specific genome segments of B. licheniformis and B. paralicheniformis have occurred during adaptation to different niches, and co-evolution with plants, plant-pathogenic bacteria and fungus could be an important driving factor . Furthermore, six mis-classification paralicheniformis strains were distinguished from the licheniformis and were also supported by the ANI result.
In addition, our comparative genomic analysis indicated that the pan-genomes of lineage P are open. Based on the COG analysis, MDJK30 contains a larger proportion of genes involved in the transcription (category K), transport and metabolism of carbohydrate and amino acid (categories G and E). The genes related to rhizosphere adaptation during PGPR-plant interactions were mainly involved in central metabolism, secretion, detoxification and stress . The diversity of carbohydrate and amino acid metabolism improves genetic fitness to adapt to the specific nutrient environment in agricultural application . The core genome of lineage P has a consistent functional category by COG compared with MDJK30 and indicated that despite their geographical isolation and varied associated plants, the majority of genes implicated in rhizosphere adaptation and competitiveness were highly conserved among B. paralicheniformis strains.
Our comparative and evolutionary analyses of the evolutionary origin and distribution of secondary metabolism gene clusters among paralicheniformis-licheniformis group in pan genome revealed the commonality and differences in rhizosphere adaptation between B. paralicheniformis and B. licheniformis (Fig. 5), implying that a specific niche, such as the plant-associated rhizosphere, influences HGT processes and gene loss [46, 47]. Interestingly, two non-ribosomal peptides and one specific Lantipeptide synthase gene were found in the lineage P and were absent in other lineages, including most lineage L, are those that involved in the synthesis of Bacitracin, Fengycin and Paralichenicidin. As the secondary metabolism gene clusters displayed an apparent deviation in GC content, it is likely that they were acquired through HGT events. The genetic organization of the gene clusters was closely related to the gene cluster for Fengycin and Bacitracin biosynthesis, suggesting that they have evolved from a common ancestor and/or might be interchangeable genetic elements (Fig. 6). For the Fengycin and Bacitracin clusters, we deepened the comparative genomic analysis by performing a phylogeny. The phylogenetic analysis revealed Fengycin cluster distributed in a narrow cluster of subtilis group, Bacitracin cluster distributed in some species of subtilis and cereus groups (Fig. 7). The phylogenetic trees of cluster and specie presented in this work, the conserved genetic organization and uncommon cluster origin, suggesting that the Fengycin and Bacitracin clusters may have been originated via an HGT event from the donor closely related to the subtilis order and the ancestor order of cereus, coagulans and subtilis groups, respectively (Fig. 7).
Cyclic lipopeptides produced by Bacillus strains have been reported to protect host plants from many pathogens. The representative families of these cyclic lipopeptides (Surfactin, Fengycin and Iturin) share a polypeptide ring linked to a lipid tail of various length [48, 49]. Fengycin prevents a large number of plant pathogens, especially filamentous fungi . Interestingly, there is an incomplete Fengycin cluster in some non-plant-associated strains, indicating gene loss events related to the biosynthetic genes of Fengycin production. Bacitracin is a broad-spectrum gram-positive bacterial antibiotic used extensively as a feed additive that is synthesized by the non-ribosomal peptide synthase bac operon by several strains of B. licheniformis and B. subtilis [11, 33]. Our comparative genomic analysis clearly illustrates the distribution of the bac operon in paralicheniformis-licheniformis group, indicating that this operon is present only in the strains of lineage P (Fig. 5b). These genes are also absent from all strains of lineage L. This indicates that Bacitracin production and erythromycin resistance could be an important characteristic specific to B. paralicheniformis.
Paralichenicidin is a two-peptide Lantipeptide. Comparison to Lichenicidin, Paralichenicidin shows differences in genetic organization and sequence similarity. We then investigated the putative homologs of Paralichenicidin. The result showed that amyloliquecidin of B. amyloliquefaciens revealed 58–82% identity with Paralichenicidin and a consistent gene order of Paralichenicidin (Fig. 6d), possibly indicating that the Lantipeptide of B. amyloliquefaciens and B. paralicheniformis are evolved from a common ancestor. Such differences display the evolutionary divergence between B. amyloliquefaciens and B. paralicheniformis. There are two other possible relative gene clusters located in S. pneumoniae and S. himastatinicus, respectively. We compared the protein sequences of these propeptides of Paralichenicidin to the putative homologous proteins (Fig. 6e). The LanA1 peptides display a high degree of conservation, reflecting the shared mode of action in specifically binding to lipid II . LanA2 displays a high level of divergence, and the low conservation of the N-terminal regions likely plays an important role in membrane insertion and pore formation . Based on the analysis of positive selection, the lanA1 and lanA2 of Paralichenicidin were identified to be under negative selection with the ω < 1, and indicating they were conserved during evolution.
As a result, Fengycin production, erythromycin resistance related to Bacitracin production, and Paralichenicidin production, may therefore be useful for differentiating B. paralicheniformis from B. licheniformis. The phylogenetic analysis between cluster tree and species tree, narrow distribution, conserved genetic organization and uncommon cluster origin, suggesting that these clusters may have been originated via HGT event. These results indicate that B. paralicheniformis has evolved specific genomic and metabolic features and obtained gene clusters and metabolites that are distinct from B. licheniformis. Our data provide differences at molecular level for the production of secondary metabolites between B. paralicheniformis and B. licheniformis.
B. paralicheniformis MDJK30 possesses several beneficial properties and has potential to enhance plant growth. The complete genome sequence of B. paralicheniformis MDJK30, along with the comparative genomics analysis, suggest that MDJK30 has diverse metabolic pathways and can utilize various energy sources. It can also synthesize siderophores for iron competition. Thus, MDJK30 can be developed and commercially formulated, either alone or as part of microbial consortia, for field application to control plant pathogens and promote crop growth. The explicit classification status of B. licheniformis and B. paralicheniformis was also confirmed by phylogenetic analysis, population structure analysis and ANI values. The most striking difference in the aspect of B. paralicheniformis is the presence of Bacitracin, Fengycin and Paralichenicidin synthetases. These synthetases are apparently not present in B. licheniformis. The origin and evolution of secondary metabolites clusters is another interesting topic, and results suggest that HGT events may have shaped the metabolic potential of B. paralicheniformis. Our study offers a new perspective regarding genomic differences between B. paralicheniformis and B. licheniformis in terms of rhizosphere adaptation.
B. paralicheniformis MDJK30 was isolated from the rhizosphere of a peony and preserved in our lab (Shandong Key Laboratory of Agricultural Microbiology, Shandong Agricultural University). The complete genome sequence of MDJK30 was sequenced and deposited at GenBank (accession number CP020352) . Our collection for constructing core genome phylogeny included 9 B. paralicheniformis strains, 46 B. licheniformis strains and 1 B. sonorensis strains (Fig. 3). Twenty other Bacillus genomes were used in constructing phylogeny of Species, Fengycin, and Bacitracin (Fig. 7). All the genomes of other strains were downloaded from the NCBI GenBank database. Bacterial strains used in this study are presented in Additional file 4: Table S2.
Analysis of the antagonistic activity
A dual culture assay  was performed to analyze the antagonistic activity of Bacillus paralicheniformis MDJK30. Fusarium solani from the root rhizosphere of peony was used to test the antifungal activity of MDJK30. Hyphal plugs of F. solani were cut from a new cultivation and placed in the center of a PDA plate to incubate for 1 day at 28 °C. Then, a single clone of MDJK30 was inoculated onto one side of the plug at a distance of 2 cm to incubate for another 3 days at 28 °C. The model organisms E. coli DH5α and B. subtilis 168 were used for antibacterial tests of MDJK30. The precultured E. coli or B. subtilis were then cultured in 5 mL LB liquid medium for 10 h at 37 °C. Then, 1 mL of the culture was mixed with 20 mL LB semisolid medium, and a single colony of MDJK30 was then placed on the center of the cooled medium to incubate for 1 day at 37 °C. The inhibition zones were measured.
Qualitative analysis of siderophores
Single clones of B. paralicheniformis MDJK30 were cultivated on LB plates overnight at 37 °C. The bacterial lawn was inoculated on a CAS-agar plate for qualitative analysis of siderophores, as reported .
Casein degradation assay
Casein degradation of B. paralicheniformis MDJK30 was tested on the casein medium, which contained 1% (w/v) casein, 0.3% (w/v) beef extracts, 0.5% (w/v) NaCl, 0.2% (w/v) K2HPO4, 2% (w/v) agar, and 0.005% (w/v) bromothymol blue, pH 7.3–7.5. Single clones of MDJK30 were inoculated on the casein medium and cultivated for 2 days at 37 °C to observe the transparent zones that indicated the capacity for degrading casein.
Identification of gene orthologous group
Orthologous groups were delimited using OrthoFinder , in which all protein sequences were compared using a BLASTP all-against-all search with an E-value cutoff of 1e-3. The single-copy core gene, pan genome and core genome set were extracted from the OrthoFinder output. Nucleotide sequences of the single-copy core genes were extracted according to protein ID in the OrthoFinder output.
The phylogenetic relationship between the Bacillus strains was predicted by analyzing the set of SNPs present in all single-copy core genes across genomes. The SNPs were integrated according to the arrangement of the single-copy genes on MDJK30 genome. Homologous recombination caused by HGT occurs in bacterial populations and can confound phylogenetic analysis. Extracting the set of SNPs, we identified and removed putative regions of recombination, using the tool CloneFrameML . Phylogenetic trees were constructed using two methods: Maximum likelihood (ML) and Neighbor Joining (NJ). The nucleotide sequence of genes used in the phylogenetic tree were aligned using MAFFT . ML tree was constructed using PhyML , with the GTR (General Time Reversible) model of nucleotide substitution, c-distributed rates among site. NJ tree was computed by applying a Poisson model available with 100 bootstrap replicates and uniform rates in MEGA7 . MEGA7 and FigTree 1.4.3 (http://tree.bio.ed.ac.uk/software/figtree/) were employed to show the trees. The choice of models was based on previous studies [56, 57].
Population structure analysis
The population genetic structure of 56 genomes was investigated using the software BAPS (version 6)  and STRUCTURE (version 2.3.4)  based on SNPs identified from the alignment of 528 single-copy core genes shared by the 56 genomes and integrated according to the arrangement of the genes on MDJK30 genome. BAPS assigns strains to inferred populations (K), representing the best fit for the observed genetic variation. We varied K from 2 to 10 and ran the experiment three times to confirm the clustering results. We ran the program STRUCTURE at values of k (the number of subpopulations 2–10) and Rep (repeats 5). STRUCTURE assumed k = 6 subpopulations and correlated allele frequencies, linkage model based on maker distances in base pairs, 10,000-iteration burnin and 10,000 iterations of sampling.
Average nucleotide identity (ANI) analysis
Average nucleotide identity (ANI), a method that can be applied to delineate species, was calculated to determine diversity at the genomic level . JSpecies1.2.1 was used to analyze these genome sets for ANI and tetramer usage pattern, using the default parameters .
Functional category of genes
We analyzed the functional category of the genes of MDJK30 and core genome of lineage P using Cluster of Orthologous Group (COG) assignment . The functional annotation of proteins was performed by alignment against the COG database of NCBI using amino acid sequences.
The secondary metabolisms analysis
The gene cluster related to secondary metabolism was identified and the putative structure was deduced using antiSMASH on the default parameters  (http://antismash.secondarymetabolites.org). The results obtained from genomic sequence correlated with NRPS pathway consisted of detailed functional domain annotation, predicted core structure, and the levels of genomic identity to known homologous gene clusters catalogued in the Minimum Information on Biosynthetic Gene Cluster (MIBiG).
Genomic features were visualized using DNAplotter . The circular maps were generated using BLAST Ring Image Generator (BRIG) software . The PHAge search tool (PHAST) was utilized to find the prophages . Genomic islands were predicted using IslandViewer . Insertion sequences were predicted using the IS Finder database . The clustered regularly interspaced short palindromic repeats (CRISPRs) were predicted with the CRISPR recognition tool (CRT) .
To examine the evolutionary origin and distribution of the secondary metabolism gene cluster, we located and screened the biosynthetic and transport-related genes and regions of gene clusters using the LS-BSR tool .
Genome mining of LanM-like enzymes
To identify LanM-like enzymes of Paralichenicidin, PSI-BLAST searches of the NCBI non-redundant protein database were performed using the LanM protein sequences as the query. Hits were selected with E-values lower than 1e-6.
Analysis for positive selection
MEGA  was used to construct maximum likelihood phylogenetic trees with the GTR model of nucleotide substitution for the core peptide coding region of LanA1 and LanA2. The resulting ML trees were applied to subsequent selection analysis. Positive selection was assessed in the program codeml in the PAMLX . Positive selection in coding regions can be estimated by calculating the ratio of the nonsynonymous substitution rate to the synonymous substitution rate (dN/dS, represented by the omerga parameter in codeml.). The likelihood ratio test (LRT) were then carried out to infer the occurrence of core peptide coding region under positive selection pressure through two different comparisons of model M1 (neutral) with model M2 (selection) and model M7 (neutral) with model M8 (selection). P values were determined from the LRT scores calculated by the module Chi-square of the PAMLX .
This work was supported by the National Key Research and Development Program of China (No. 2017YFD0200804), the National Natural Science Foundation of China (NSFC, grants 31700094, 31770115, and 31600090), the National Science and Technology Pillar Program of China (2014BAD16B02), and the funds of Shandong “Double Tops” Program (SYL2017XTTD03). The above funds played no role in study design, data analysis, and manuscript preparation. The opinions expressed in this paper are those of the authors.
Availability of data and materials
All sequences and annotations referenced in the manuscript are publically available on the GeneBank Database at the accession number provided (Additional file 4: Table S2).
BD and CW designed the research. YD, JM, and ZY performed the laboratory work and analyzed the data. ZY wrote the manuscript and CW revised. KL, GY, WX, and LF advised the manuscript. YD supported the study. All authors read and approved the final manuscript.
Ethics approval and consent to participate
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
- 31.Borriss R, Chen XH, Rueckert C, Blom J, Becker A, Baumgarth B, et al. Relationship of Bacillus amyloliquefaciens clades associated with strains DSM 7T and FZB42T: a proposal for Bacillus amyloliquefaciens subsp. amyloliquefaciens subsp. nov. and Bacillus amyloliquefaciens subsp. plantarum subsp. nov. based on complete genome sequence comparisons. Int J Syst Evol Microbiol. 2011;61:1786–801.CrossRefGoogle Scholar
- 40.Kumar Y, Goodfellow M. Five new members of the Streptomyces violaceusniger 16S rRNA gene clade: Streptomyces castelarensis sp. nov., comb. nov., Streptomyces himastatinicus sp. nov., Streptomyces mordarskii sp. nov., Streptomyces rapamycinicus sp. nov. and Streptomyces ruanii sp. nov. Int J Syst Evol Microbiol. 2008;58:1369–78.CrossRefGoogle Scholar
- 41.Collins FWJ, O’Connor PM, O’Sullivan O, Rea MC, Hill C, Ross RP. Formicin – a novel broad-spectrum two-component lantibiotic produced by Bacillus paralicheniformis APC 1576. Microbiol (United Kingdom). 2016;162:1662–71.Google Scholar
- 51.Guo H, Yang Y, Liu K, Xu W, Gao J, Duan H, et al. Comparative genomic analysis of Delftia tsuruhatensis MTQ3 and the identification of functional NRPS genes for siderophore production. Biomed Res Int 2016;2016:3687619.Google Scholar
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.