Introduction

The prevalence of drug-resistant pathogens has been a serious problem and effected the human life and agriculture. The World Health Organization (WHO) estimates ten million deaths by 2050 if multi-drug resistant (MDR) infections are not appropriately managed [1, 2]. All major antibiotic classes have been found to have antimicrobial  resistance, and the number of candidates for novel antibiotics is dwindling. Hence screening novel antibacterial compounds is critical for new drug discovery [3].

Microbial natural products are the important sources of drug discovery because of their structural diversity to make up more than 75% of antibiotics [4, 5]. The 99.99% rediscovery rate in traditional discovery pipelines of natural products is a big drawback [6]. However, the last decade has been a revival time for natural product discovery which was fueled by advances in analytical chemistry, bioinformatics, and whole genome sequencing [7].

Microbial genome sequencing revealed that they contain huge sources of cryptic BGCs, which have a larger capability to produce secondary metabolites. Availability of whole genome sequences and synthetic biology-inspired tools/approaches make it possible to utilize these BGCs to develop new chemicals with new structures, new activity and new targets [8].

Modern natural product discovery relies on, to a higher extent, on the microbial genome sequencing and computer mining for BGCs. Next stages include selecting unique BGCs, cloning and expressing selected BGCs in an optimal heterologous host or activating in situ silent BGCs. This pipeline (genome mining of NPs) takes less time on dereplication and streamlines NP discovery via the use of advanced computational, microbiological and synthetic biological approaches, to more extents, compared to traditional screening methods.

Most members of Burkholderia are well-known as pathogens to their hosts (plants or human) and now 44 members in this genus have been identified [9]. In recent years, many species of Burkholderia have been found to have the ability to excrete a range of secondary metabolites, including antibacterial, anticancer, herbicidal, and insecticidal chemicals that can act as bioremediation, biocontrol and plant growth promotion agents [10, 11].

More recently, the increasing data of Burkholderia genome sequences have shown a vast reservoir of NPs, such as non-ribosomal peptides (NRPs) and polyketides (PKs), with various pharmacological functions [12]. Many silent BGCs in Burkholderia genomes remain unexplored as potential drug development targets. Using genome mining approaches, many compounds, such as bolagladins/glidochelins, gladiofungins, thailandepsins/burkholdacs, romidepsin (FK228) and so on, were discovered from Burkholderia [13].

Due to the restrictive growth conditions, only a limited number of Burkholderia species have been isolated and identified as having NP BGCs or NP producers. Thus, the isolation of more species in Burkholderia from various environments and high-quality sequencing of Burkholderia genomes still are necessary for multi-omics research, which aids in the understanding of BGC regulation and rationally designing biosynthetic pathways of NPs [14, 15].

The purpose of this research was to determine the potential of a Burkholderia strain S-53 obtained from a small mountain area, which showed a quicker growth rate among three species of Burkholderia. Its genome was sequenced and analyzed for the presence of putative NP BGCs. Our data revealed this strain contains a substantial number of BGCs, indicating that its potential capability of producing new chemicals with biological activity.

Materials and methods

Isolation and characterization of Burkholderia

We collected soil samples from a small mountain (location: Tiesi Gang in Zoushi Town, 29.12755 N,111.564903E) in Changde City, Hunan Province, P.R. China using sterilized spoons. Soil samples were pretreated by drying at room temperature and then soaked in PBS buffer (10 mL PBS/g soil). Pretreated samples were serially diluted in PBS and seeded onto solid CYMG (8 g/l Casein peptone, 4 g/L Yeast extracts, 4.06 g/L MgCl2·2H2O, and 10 ml/L 50% Glycerin) medium, then cultivated  at 28 ℃ for 2 days. Whitish colonies were analyzed by colony PCR for 16S rRNA amplification with universal primers 27F(5’-AGAGTTTGATCCTGGCTCAG-3) and 1492R (5’-TACGACTTAACCCCAATCGC) under the standard PCR conditions (95℃ for 5 min, then 30 cycles of 94℃ for 1 min, 55–58℃ for 1 min and 72℃ for 90 s), followed by sequencing in Sangon Biotech (Shanghai) to pick out Burkholderia species. Morphological features of S-53 were recorded when cultivating on CYMG agar plates and molecular taxonomic approaches via TrueBac™ ID system and Type Strain Genome Server (TYGS)) were used to characterize the resultant isolates.

Measurement of the growth curve of S-53

S-53 was inoculated into CYMG microwell plates (400 μ L CYMG broth in each well) using 15 wells as parallel groups, and cultivated at 30℃ for 30 h. During cultivation, OD600 values for each well were recorded once at 1 h interval. Taken the OD600 value of each parallel well at the 0 h as the blank control, the difference (OD600/n–h -OD600/0-h) between the OD600 at each time-point (n- h) and OD600 at 0 h was calculated to represent the growth of S-53. Using the average values of OD600/n–h -OD600/0-h as Y-axis and time-point per h as X-axis, the growth curve of S-53 was obtained.

Extraction of high molecular weight genomic DNA

Burkholderia strain S-53 was inoculated into 50 mL of CYMG liquid culture medium with glass beads (3 ± 0.3 mm diameter) in a 250 mL baffled flask and cultured for 24 h at 30 °C in a 200-rpm orbital shaker. To extract genomic DNA (gDNA), 50 mL cultivated cells were collected during the exponential growth phase and washed twice with the same amount of 10 mM EDTA followed by 45 min at 37 °C with lysozyme (10 mg /mL). gDNA for gram negative bacteria was extracted using TIANamp Bacterial DNA kit from Tiangen Biochemical Technology (Beijing) Co., Ltd, according to the instructions from the manufacturer. We determined the quality and amount of extracted gDNA samples using 1% agarose gel electrophoresis on Nanodrop (Thermo Fisher Scientific, Waltham, MA, USA).

de novo Genome sequencing, assembly and annotation

To get fine sequence data, gDNA of S-53 was submitted to GENEWIZ Biotechnology Co., Ltd in Tianjin, China for genome sequencing with two methods:

  • For Illumina sequencing, firstly, DNA was fragmented into around 500 bp, repaired for blunt ends, and then modified with the base "A" through the 3' end, so that the DNA fragments can be connected to the linker with the "T" base at the 3' end. The target fragment ligation product is recovered, and then PCR is used to amplify the DNA fragments with adapters at both ends, and finally the qualified library is used for cluster preparation and sequencing.

  • For PacBio sequencing, 5–10 μg genomic DNA was sheared into 10–15 kb fragments using a g-TUBE device. Then library was constructed using the SMRTbell® Express Template Preparation Kit 2.0. The PCR products obtained using library DNA as templates were cleaned up and validated using an Agilent 2100 Bioanalyzer. Next, the qualified libraries were sequenced with pair-end PE150 on the illumina HiseqXten/Novaseq/MGI2000 System or on Sequel II sequencing platform.

The library sequenced were assembled using HGAP4/Falcon of WGS-Assembler 8.2 [16,17,18,19,20,21], then recorrected with software Pilon using previous illumine z data or Quiver.

Finding coding genes was conducted using the Prodigal [22]/Augustus [23] gene-finding software while detection of transfer RNAs (tRNAs) was done using the program tRNAscan-SE [24] with default parameter settings. rRNAs were identified by using Barrnap. Other RNAs were identified by rfam database. By BLAST using National Center for Biotechnology Information (NCBI) NR database, the coding genes were annotated (screening conditions were displayed in Table 1).

Table 1 Features of Burkholderia strain S-53

GO [25] (Gene Ontology) database and KEGG [26] (Kyoto Encyclopedia of Genes and Genomes) database were used for analyzing functions of genes and annotating the pathways. The database of COG/KOG [14] (Clusters of Orthologous Groups) was used for phylogenetic classification of proteins.

Phylogenetic analysis

Two methods were used for phylogenetic analysis of S-53:

  1. (i)

    Whole genome-based taxonomic analysis was conducted using the Genome BLAST Distance Phylogeny approach (GBDP) by uploading genome sequence data to the Type (Strain) Genome Server (TYGS), a free bioinformatics platform accessible at https://tygs.dsmz.de [27].

  2. (ii)

    A phylogenetic tree was constructed based on the 16S rRNA gene sequence of the Burkholderia strain S-53 and those extracted from the list of hits from EzBioCloud 16S database [28]. Evolutionary trees were established with maximum-likelihood methods [29] in MEGA X package [30]. The confidence of the tree topologies was assessed by 100 bootstrap replicates.

Whole genome sequences for bacterial identification

Bacterial identification utilizing whole genome sequences was conducted on the TrueBac™ ID technology, a cloud-based service [31] to reveal the genuine identification of bacterial isolates using a multitude of methods.

Comparative genomic studies/whole genome relatedness

For a whole genome-based taxonomic analysis, the genome sequence data were uploaded to the Type (Strain) Genome Server (TYGS), a free bioinformatics platform accessible at https://tygs.dsmz.de (accessed 28 December 2021). The Genome BLAST Distance Phylogeny approach (GBDP) was used to calculate dDDH (digital DNA–DNA hybridization) values and construct minimum evolution trees using TYGS [32, 27]. MEGA-X [30] was used to visualize GBDP trees. The ANI/AAI-Matrix calculator was used to calculate the average nucleotide identity (ANI) [33, 34]. The average amino acid identity (AAI) and average nucleotide identity (ANI) matrices of all conserved genes in the core genome were computed by the BLAST algorithm and visualized as heat maps for a more in-depth qualitative comparison between the genomes.

Using EZBIOCLOUD, the average nucleotide identity (ANI) of the assembled genome nucleotide files was calculated against the whole genome sequences of the strains used for 16S rRNA sequence analysis [35]. This method computes nucleotide identity through pairwise sequence alignment, yielding an overall average similarity of the genomes that is independent of sequence length.

The CGView (http://cgview.ca/) was used to generate a graphical representation of the BLAST result comparison of the available genomes to the genome of Burkholderia strain S-53.

Secondary metabolite biosynthetic gene cluster prediction

As a main approach for finding and annotating genes in BGCs across the genome, antiSMASH 6 [36] combined with ClusterBlast, ActiveSiteFinder, ClusterBlast, Cluster PFam analysis, SubClusterBlast, PRISM 4 and BAGEL 4 [36] was used for discovery of BGCs in the genome of S-53 for secondary metabolites.

Particularly, BAGEL 4 was used to mine BGCs for RiPPs and bacteriocin, whereas PRISM 4 was designed for structural prediction of secondary metabolites [37]. Several database systems, including the principles of hidden Markov model (HMM) [38], BLAST algorithm [39], PFAM [40], GenBank [41], UniprotKB [42], bactibase [43], CAMPR3 [44], and the MiBig data repository [45] were used for BGC annotation. As well, NapDos was used [46] to look for KS and C domains in these genomic sequences.

Results

Morphological and microscopic examination and phylogenetic analysis of 16S rRNA

In order to isolate more species of Burkholderia from the soil samples, we incubated a serial of isolates on CYMG medium at 30℃, followed by colony PCR amplification for 16S rRNA gene. Next by 16S rRNA-based phylogenetic analysis, 3 isolates were identified as Burkholderia, representing different species: S-53 shared the highest gene identity of 16S rDNA (99.93%) with the type strain Burkholderia stabilis (NCBI Blastn). The S-53 colonies on CYMG medium were recorded (Fig. 1).

Fig. 1
figure 1

Morphology of Burkholderia colonies on CYMG agar medium (cultivation duration:18 h) (a) and the growth curve of S-53 on CYMG broth (b)

The partial 16S rDNA gene sequence of the S-53 strain, 1337 bps in length, was deposited in the GenBank nucleotide database with an accession number of OM019084.

Among three strains we isolated, we found S-53 grows more rapidly (much shorter than 18 h into its stationary stage) than other two (longer than 18 h) (Fig. 1b). Because a higher growth rate is an important feature for species of Burkholderia for expressing of NPs, we chose S-53 for next genome sequencing.

Genomic features of Burkholderia strain S-53

The genome of Burkholderia strain S-53 is 8.254 Mbps in length and composed of 7239 protein-encoding genes, 63 tRNA genes, 18 rRNA genes and 72 ncRNA genes (Table 1 and Table 2).

Table 2 Statistics of function annotations for protein-encoding genes in S-53

Figure 2 showed a circular chromosome based S-53 genome sequence using CG View server (http://cgview.ca/), which is a web-based tool for comparative genomics analysis on circular genomes [47].

Fig. 2
figure 2

Schematic representation of the circular chromosome of Burkholderia strain S-53, created by CG View server (http://cgview.ca/). Circle 1 (outermost) displays the 3 Contigs while circle 2 displays the GC content plot and circle 3 (innermost) displays the GC skew. To indicate genome sizes inside and outside, the ruler was used in the chromosome map

The genome sequence of the Burkholderia strain S-53 has been deposited at GenBank under the GenBank accession CP090482-CP090484.

Bacterial strain identification by whole genome sequence and comparative genome analysis of S-53

Here, using TrueBac™ ID system [31] for bacterial identification based on whole genome sequence of S-53 strain, it could be identified as Burkholderia pyrrocinia (Table 3).

Table 3 Identification of Burkholderia S-53 based on whole genome sequence

Further, we performed comparative genome analysis of Burkholderia S-53 (Table 4): the pairwise comparison of Burkholderia strain S-53 was recorded from TYGS [27] which is a fast-increasing discipline of genome-based taxonomy descriptions of new genera, species, and subspecies (https://tygs.dsmz.de/).

Table 4 Pairwise comparisons of S-53 as query strain using TYGS

We also used the TrueBac™ ID [31] to make genome-wide alignment, and found Burkholderia strain S-53 has the highest similarity to Burkholderia pyrrocinia, and Burkholderia stabilis (Table 5). Its taxonomic ranks include Bacteria, Proteobacteria, Betaproteobacteria, Burkholderiales, Burkholderiaceae and Burkholderia.

Table 5 Whole genome alignment using TrueBac™ ID system

Phylogenetic analysis via GBDP method

Using Genome BLAST Distance Phylogeny (GBDP) method and tree builder service, the phylogeny tree of Burkholderia strain S-53 using its whole genome sequence was created while FastME was used to estimate the tree using GBDP intergenomic distances derived from complete proteomes.

GBDP phylogenetic tree constructed by using 16S rRNA indicated that S-53 is similar to Burkholderia pyrrocinia DSM10685K (Fig. 3a). On the other hand, GBDP phylogenetic tree constructed by using whole genome indicated that S-53 is similar to B. stabilis ATCCBAA-67 (Fig. 3b).

Fig. 3
figure 3

GBDP phylogenetic trees of S-53. The Genome BLAST Distance Phylogeny (GBDP) method was used to estimate the phylogeny. GBDP intergenomic distances computed from complete proteomes were used to estimate the tree using FastME. Using 16S rRNA sequences (a) or whole genome sequences (b), GBDP phylogenetic trees were constructed

In all, 16S rRNA-based GBDP phylogenetic tree and whole genome alignment and comparative genome analysis suggested it to be Burkholderia pyrrocinia, while GBDP phylogenetic tree constructed by using whole genome supported it to be B. stabilis. Combining these analyses, we concluded it to be closer to Burkholderia pyrrocinia.

On the other hand, the phylogenetic tree was constructed from EzBioCloud 16S database by maximum-likelihood methods by Mega X application with 100 bootstrap values depicted in the Fig. 4. According to the maximum likelihood method, S-53 is close to the Burkholderia stabilis ATCC BAA-67 and Burkholderia pyrrocinia DSM 10,685.

Fig. 4
figure 4

Evolutionary analysis of S-53 using the Maximum Likelihood method. The proportion of phylogenetic trees with the same taxonomy is given next to the branches. For the heuristic search, we used Neighbor-Join and BioNJ algorithms on a matrice of pairwise distances evaluated using Maximum Composite Likelihood (MCL) and chose the topology with the best log likelihood value. The tree's branch lengths are measured in substitutions per location. These 51 nucleotides were studied. Gaps and incomplete data were removed from all spots (complete deletion option). The final dataset has 1279 positions. MEGA X was used to study evolution

Prediction of NP BGCs in S-53 genome

Using antiSMASH 6.0 [36], BAGEL 4 [48] and PRISM 4 [49], we found a lot of BGCs on the Burkholderia strain S-53 genome for different secondary metabolites.

Through the prediction using antiSMASH 6.0, 15 BGCs were discovered (Table 6 and Fig. 5). The major BGC types include those for NRPs (2 BGCs), terpene (4GCs) and hybrid (3 BGCs) (Fig. 5).

Table 6 The analysis of biosynthetic pathways in Burkholderia sp. S-53 by antiSMASH 6.0
Fig. 5
figure 5

Localization of secondary metabolite clusters in the genome of Burkholderia sp. S-53

Among 15 BGCs, five BGCs (cluster 3, 5, 8, 10 and 13 regions 1.3, 2.1, 3.1, 3.3 and 3.6) were more than 50% identical to known BGCs. Other BGCs exhibited just a low degree of similarity or resemblance to previously identified BGCs, implying that Burkholderia sp. S-53 has a significant potential for the production of novel NPs in the future.

Moreover, we performed BAGEL analysis on S-53 genome and identified additional 2 different clusters for bacteriocins and RiPPs (Table 7).

Table 7 Clusters in S-53 for RiPP and bacteriocin predicted by BAGEL

In addition, PRISM algorithm (https://prism.adapsyn.com/results/4c5c8259bfef7b827d3c7b9cdc95df6c) was used here to predict the structures of genetically encoded natural products using Burkholderia sp. S-53 genomes.

Figure 6 showed predicted compounds by a total 10 clusters, including 3 for NRPs, 2 for PKs, 1 for Class II/III bacteriocin, 1 for aryl polyene and 1 for acyl homoserine lactone (Fig. 6).

Fig. 6
figure 6

Predicted structures of secondary metabolites by S-53 using PRISM

Discussion and conclusion

The high potential of Burkholderia to produce bioactive NPs has been reported with an increasing publishing record in decade years. Moreover, the rapid growth rate and low fermentation cost make them as a potential host for heterologous expression of some NP BGCs, aided by the establishment of genetic manipulation systems [50].

In this work, genomic investigation of microbes isolated from underexplored mountain habitats found several strains of Burkholderia species, among of which, S-53 attracted us for its comparatively quicker growth rate (16–18 h for entering stationary stage, compared to 24 h for general Burkholderia species), as a critical feature when it could be developed into a host for expressing some NP BGCs later.

We identified it using different methods and analyzed its evolution. Given that modern bacterial taxonomy uses genome sequence data to identify taxa, by means of genome sequence data, identification of a bacterial species is considered always to be more correct and persuasive. Thus, though different molecular methods for bacterial identification gave a little different result, our data deduced it to be Burkholderia pyrrocinia.

Other subspecies of Burkholderia pyrrocinia were ever isolated from different habitats, such as Burkholderia pyrrocinia JK-SH007, a plant growth-promoting bacteria from plat rhizosphere [51]. Burkholderia pyrrocinia, along with Burkholderia cenocepacia and Burkholderia ambifari, was referenced as Burkholderia cepacia complex (BCC) species, which are most frequently associated with roots of crop plants [52].

Taxonomy and identification of species in Burkholderia still are quite challenging. Though a high similarity of 16S rDNA ranging 98–100% often is used as “common standard” for bacterial identification at species level, it could not be applicable to classification of Burkholderia species [53], especially, for classification of BCC group of Burkholderia. So, the whole-genome-sequence-based taxonomic analysis could give comparably more reliable results, when combining other molecular methods.

Genomics-based bottom-up techniques have been developed to reveal previously undiscovered natural product biosynthesis pathways [54]. Here, whole genome sequencing and bioinformatic analyses of Burkholderia strain S-53 revealed many secondary metabolite biosynthetic gene clusters. Moreover, bioinformatics analysis uncovered more than two-thirds of BGCs in S-53 are not related to recognized clusters (Table 6).

These data supported that S-53 could be a good candidate used for identifying new NPs. Next, more research is needed to improve, isolate, and identify new bioactive natural products from this strain and to investigate the possibility of it to be as chassis for expressing of new NPs.