Introduction

A large number of bioactive secondary metabolites have been found from actinomycetes1,2. In past years, each secondary metabolite producer was taxonomically identified at the species level based on morphological, cultural, physiological and chemical features. Consequently, correlation data between each species and its secondary metabolites are steadily being accumulated. For example, Streptomyces griseus, Streptomyces avermitilis and Streptomyces tsukubensis are well known to produce streptomycin, avermectin and tacrolimus, respectively3,4,5. However, taxonomic position of producing strains of new secondary metabolites are usually determined at the genus level based on their 16S rRNA gene sequences, while species-level assignment is not always done in the field of natural product research. Although species-level classification of secondary metabolite producers gives crucial information for researchers who are seeking new microbial compounds, relationship between species names and secondary metabolites is unclear for most cases.

Genome analyses of actinomycetes are revealing that various biosynthetic gene clusters (BGCs) for secondary metabolites are encoded in their genomes and about half to three quarters of the clusters are associated with nonribosomal peptide synthetase (NRPS) and polyketide synthase (PKS) pathways6, which suggests that nonribosomal peptides, polyketides and their hybrid compounds are the major secondary metabolites of actinomycetes. These compounds often show pharmaceutically useful bioactivities, and many have been developed into various drugs such as antibiotics, anticancer agents, and immunosuppressants. Hence, recently, genome analysis focused on NRPS and PKS gene clusters is often employed to evaluate actinomycete strains for their ability of secondary metabolite production7,8,9,10.

A marine-derived Streptomyces sp. TP-A0882 produces butyrolactol11. We recently identified the gene clusters responsible for butyrolactol and thiazostatin biosynthesis in this strain using whole genome analysis12. In the present study, we sequenced the genomes of three type strains taxonomically closely related to strain TP-A0882, and conducted in silico DNA-DNA hybridization (DDH) to identify this strain at the species level. We further analyzed secondary metabolite-BGCs (smBGCs) such as NRPS and PKS gene clusters in each of the genomes to elucidate the diversity of secondary metabolite-biosynthetic pathways among the taxonomically close species and provide information useful for researchers screening Streptomyces strains for new compounds.

Results

Taxonomic identification of butyrolactol-producing Streptomyces sp. TP-A0882

The 16S rRNA sequence of Streptomyces sp. TP-A0882 showed >99% nucleotide similarity to those of S. diastaticus subsp. ardesiacus NRRL B-1773T (99.9%, 1464/1465), S. coelicoflavus NBRC 15399T (99.4%, 1455/1464), and S. rubrogriseus LMG 20318T (99.0%, 1448/1462). Next, we sequenced the genomes of S. diastaticus subsp. ardesiacus NBRC 15402T, S. coelicoflavus NBRC 15399T, and S. rubrogriseus NBRC 15455T and compared them with the previously sequenced genome of Streptomyces sp. TP-A0882 to estimate their DNA-DNA relatedness values. As shown in Table 1, the DDH estimate for the comparison between Streptomyces sp. TP-A0882 and the S. diastaticus subsp. ardesiacus type strain was 94.4%. Because the probability that the DDH estimate value exceeds 70% was calculated as 97.1% (Table 1), these two strains were confirmed to belong to the same species. On the other hand, the DDH estimates between Streptomyces sp. TP-A0882 and the other taxonomically close species were lower than 46%. Therefore, we identified Streptomyces sp. TP-A0882 as S. diastaticus subsp. ardesiacus.

Table 1 Genome sequencing and digital DNA-DNA hybridization (DDH) values estimated by GGDC 2.1.

NRPS and PKS gene clusters

In our previous study, we sequenced the genome of Streptomyces sp. TP-A0882 and identified BGCs for butyrolactol and thiazostatin12. The genome contains at least 14 gene clusters coding for proteins involved in NRPS and PKS pathways (Table 2). To validate whether taxonomically close strains share similar secondary metabolite biosynthetic pathways, in the current study we surveyed the NRPS and PKS gene clusters in the genomes of S. diastaticus subsp. ardesiacus NBRC 15402T, S. coelicoflavus NBRC 15399T, and S. rubrogriseus NBRC 15455T.

Table 2 Open reading frames (ORFs) encoding nonribosomal peptide synthetases (NRPSs) and polyketide synthases (PKSs) in NRPS and PKS gene clusters from Streptomyces sp. TP-A0882 (NBRC 110030).

S. diastaticus subsp. ardesiacus NBRC 15402T harbors four NRPS gene (nrps) clusters, one hybrid PKS/NRPS gene (pks/nrps) cluster, at least four type I PKS gene (t1pks) clusters, two type II PKS gene (t2pks) clusters, and three type III PKS gene (t3pks) clusters, as shown in Tables 3 and 4. The number and types of gene clusters are same as those of Streptomyces sp. TP-A0882 and the sequences show >99% amino acid sequence identity to those of Streptomyces sp. TP-A0882 (NBRC 110030) based on BLAST analysis in all cases except ORF77-1 and ORF80-1 (Table 4). The structures of predicted products of the gene clusters from NBRC 15402T also matched those of TP-A0882. These results suggested that the two S. diastaticus subsp. ardesiacus strains contain identical NRPS and PKS pathways.

Table 3 Numbers of secondary metabolite-biosynthetic gene clusters (smBGCs) encoded in each genome.
Table 4 ORFs encoding NRPSs and PKSs in NRPS and PKS gene clusters from S. diastaticus subsp. ardesiacus NBRC 15402T.

S. coelicoflavus NBRC 15399T harbors four nrps clusters, two pks/nrps clusters, three t2pks clusters, and one t3pks cluster, as shown in Table 5. Unlike typical Streptomyces strains, t1pks cluster is not present in this strain. nrps-i, nrps-ii, pks/nrps-i, t2pks-i, and t3pks-i were predicted to be responsible for the synthesis of coelibactin, coelichelin, prodiginine, gray spore pigment, and tetrahydroxynaphthalene (THN), respectively, based on high similarities (85–99% amino acid sequence identity) to SCO7681-7683, SCO0492 (CchH), SCO5886-SCO5894 (Red), SCO5318-SCO5316 (WhiE), and SCO1206 (RppA) of Streptomyces coelicolor A3(2)6,13, respectively. Based on the domain and module organizations and substrate selective residues in the A domains, nrps-iii and nrps-iv were predicted to synthesize nonribosomal peptides consisting of eight amino acids and 13 amino acids, respectively. The product of pks/nrps-ii was speculated to be a novel oxazolomycin analog because the domain organization is similar, but not identical, to that of the BGCs for oxazolomycins14. Although the remaining two gene clusters (t2pks-ii, t2pks-iii) are likely to be responsible for the synthesis of aromatic polyketides, the structures were not predicted from the sequence information alone. Analysis of the genome sequence of S. coelicoflavus strain ZG0656, the only S. coelicoflavus strain of which genome sequence is published15, indicated that all of the S. coelicoflavus NBRC 15399T gene clusters (Table 5) are present also in strain ZG0656 with >97% amino acid sequence identity based on BLAST comparisons.

Table 5 ORFs encoding NRPSs and PKSs in NRPS and PKS gene clusters of S. coelicoflavus NBRC 15399T.

S. rubrogriseus NBRC 15455T harbors four nrps clusters, one pks/nrps cluster, at least three t1pks clusters, two t2pks clusters, and two t3pks clusters (Table 6). nrps-a, nrps-b, nrps-c, pks/nrps-a, t1pks-a, t1pks-b, t2pks-a, t3pks-a, and t3pks-b were predicted to be responsible for the synthesis of coelibactin, coelichelin, calcium-dependent antibiotic (CDA), prodiginine, coelimycin, eicosapentaenoic acid, gray spore pigment, THN, and phenolic acid, respectively, based on high similarities (91–100% amino acid sequence identities) to SCO7681-7683, SCO0492 (CchH), SCO3230-SCO3032 (CDA peptide synthetases), SCO5886-SCO5894 (Red), SCO6275-SCO6273 (Cpk), SCO0126-SCO0127, SCO5318-SCO5316 (WhiE), SCO1206 (RppA), and SCO7671 (SrsA ortholog)6,13, respectively. Based on the domain and module organization and substrate selective residues in the A domains, nrps-d was predicted to synthesize a peptide containing cysteine. Other t1pks cluster(s) were not completely sequenced, but their predicted PKS proteins do not have high sequence similarity to the known PKS proteins, suggesting that the product(s) might be novel. t2pks-b is likely to synthesize aromatic polyketides, but the products could not be predicted because the sequence does not show a high degree of similarity to any PKS whose products have been elucidated. Among the 12 gene clusters, all except the other t1pks genes and t2pks-b show >93% sequence similarity to the corresponding genes from S. coelicolor A3(2), suggesting that most of the gene clusters in S. rubrogriseus NBRC 15455T are present also in S. coelicolor A3(2).

Table 6 ORFs encoding NRPSs and PKSs in NRPS and PKS gene clusters of S. rubrogriseus NBRC 15455T.

Conservation of NRPS and PKS gene clusters among taxonomically close species

As summarized in Fig. 1a, BGCs for coelibactin, coelichelin, gray spore pigment, and THN are present in all of the strains. The prodiginine biosynthetic gene (red) cluster is not present in S. diastaticus subsp. ardesiacus strains NBRC 15402T and TP-A0882, but is present in both S. coelicoflavus NBRC 15399T and S. rubrogriseus NBRC 15455T. The phenolic lipid biosynthetic gene (srs) cluster is present in both S. diastaticus subsp. ardesiacus strains and S. rubrogriseus NBRC 15455T. Products of the nrps-3 cluster from the S. diastaticus subsp. ardesiacus strains and the nrps-iii cluster from S. coelicoflavus NBRC 15399T include mCys-Val-x-x-Ser. However, their products are actually not the same (S. diastaticus subsp. ardesiacus strains, mCys-Val-x-x-Ser; S. coelicoflavus NBRC 15399T, x-x-Ser-mCys-Val-x-x-Ser). Overall, the S. diastaticus subsp. ardesiacus strains, S. coelicoflavus NBRC 15399T, and S. rubrogriseus NBRC 15455T harbor at least eight, four, and six species-specific gene clusters, respectively.

Figure 1
figure 1

Schematic diagram showing diversity of NRPS & PKS gene clusters (a) and the other biosynthetic gene clusters (b) in the taxonomically close species. As nrps-3 of the S. diastaticus subsp. ardesiacus strains and nrps-iii of S. coelicoflavus NBRC 15399T show partial sequence similarity, the diagram shows putative sharing between these two species. However, the gene products of nrps-3 and nrps-iii are divergent (mCys-Val-x-x-Ser and x-x-Ser-mCys-Val-x-x-Ser, respectively). Abbreviations: CDA, calcium-dependent antibiotic; EPA, eicosapentaenoic acid; GPS, gray spore pigment; m, methyl-; NIS, NRPS-independent siderophore; pk, moiety derived from PKS pathway; THN, tetrahydroxynaphthalene; x, unidentified amino-acid; y, unknown building block. aThe lantipeptide BGC, whose precursors peptide sequences are AVLINLDhbDDGCGDhaDhbCDhaDhaPCADhbNVA and CNGDhaCADhbNVA, is not present in the genome of of S. diastaticus subsp. ardesiacus NBRC 15402T; bincluding desferrioxamine; calbaflavenone, hopene, carotenoid & gosmin.

The other secondary metabolite-biosynthetic gene clusters

In addition to NRPS and PKS gene clusters, the other smBGCs were also investigated. Thirteen to 18 gene clusters are encoded in each genome as shown in Table 3. Table 7 lists the clusters with putative products and loci. Homologous gene clusters are aligned in the same row in the table. S. diastaticus subsp. ardesiacus TP-A0882 and NBRC 15402T shared the same set of gene clusters, except for a BGC for lantipeptides, suggesting that the two strains contain almost identical secondary metabolite biosynthetic pathways. Among the 18 BGCs of S. coelicoflavus NBRC 15399T, 13 are present also in S. coelicoflavus strain ZG0656 whereas three lantipeptide and two terpene BGCs are not. All 15 BGCs identified from S. rubrogriseus NBRC 15455T are present also in S. coelicolor A3(2) (data not shown). BGCs for bacteriocin, ectoine, indole melanine, two siderophores, four terpenes are sheared among the three species, whereby 3 to 5 BGCs are specific in each species (Table 7, Fig. 1b).

Table 7 Loci encoding the other smBGCs in the draft genome sequences.

Discussion

Genome analysis conducted in this study shows that S. diastaticus subsp. ardesiacus strains TP-A0882 and NBRC 15402T share an almost identical set of smBGCs, while S. coelicoflavus strains NBRC 15399T and ZG0656 shared their own similar set of gene clusters. Previous studies on Nocardia brasiliensis8 and Salinispora species16 have also shown that most smBGCs are common within each species, with strain-specific ones being relatively limited. These results suggest that actinomycete strains belonging to the same species are also likely to possess similar secondary metabolite biosynthetic pathways.

In contrast, only a limited number of smBGC are shared by different species examined in this study, even though they have >99% 16S rRNA gene sequence similarity and are thus considered taxonomically close. We identified totally 49 different smBGCs including 25 NRPS and PKS gene clusters from the three species. Among them, 14 clusters, responsible for production of coelibactin, coelichelin, gray spore pigment, THN, bacteriocin, ectoine, indole, melanin, two types of NRPS-independent siderophres, and four types of terpenes are conserved among the three species, while additional five clusters for phenolic lipid, prodiginine, nonribosomal peptide, lantipeptide, and terpene syntheses are shared by two species. Coelibactin and coelichelin are iron-chelating molecules, known as siderophores, that are involved in uptake of ferric iron17. Like gray spore pigment and melanin, THN is involved in pigmentation, as it is used in melanin formation18. Pigment production is often examined in taxonomic studies19. Phenolic lipids are components of the cell wall, and are involved in resistance to β-lactam antibiotics by affecting the characteristics and rigidity of the cytoplasmic membrane/peptidoglycan20. Ectoine is an osmolyte and involved in protection against extreme osmotic stress21. Therefore, many of the conserved/shared gene clusters identified in this study are physiologically and/or taxonomically important. The remaining 33 smBGCs are species-specific, with each of the three species containing different eleven specific clusters.

Unexpectedly, most of the gene clusters in S. rubrogriseus NBRC 15455T are present also in S. coelicolor (correctly classified as Streptomyces violaceoruber)22 A3(2). As the sequence similarities in these regions are very high (>93%), we considered it possible that strains NBRC 15455T and A3(2) might actually be the same species. To clarify this, we conducted in silico DDH analysis of the two genome sequences. The resulting estimated DDH value is 70.3% (67.3–73.2%), which is just on the borderline between two strains belonging to the same or different species, and the probability that the value exceeds 70% was calculated to be 78.9% (data not shown). Orthologs of the other t1pks cluster(s) and t2pks-b found in S. rubrogriseus NBRC 15455T (Table 6) were not identified in S. coelicolor A3(2), while orthologs of SCO5073-SCO5092 (actinorhodin), SCO6826-SCO6827, SCO7669-SCO7671 (aromatic polyketide), SCO7221 (germicidin), SCP1.228c-SCP1.246 (methylenomycin), SCO0381-SCO0401, and SCO7700-SCO7701 (2-methylisoborneol) present in S. coelicolor A3(2), could not be identified in S. rubrogriseus NBRC 15455T. These findings indicated that strains NBRC 15455T and A3(2) are likely to be separate species. Very recently, phylogenetic relationships among Streptomyces species were examined using multi-locus sequence analysis. The study showed that S. violaceoruber was distinct from S. rubrogriseus23, supporting our current conclusion.

Here, we have shown an example that actinomycetes strains belonging to the same species share a conserved set of smBGCs, whereas different species each harbor species-specific smBGCs in addition to some common ones even if the species are taxonomically close. Relationships between species and smBGCs in actinomycetes were reported by Doroghazi et al.24, Ziemert et al.16, and Seipke et al.25. As the study by Doroghazi et al. is a large-scale analysis for taxonomically diverse 840 actinobacterial strains encompassing many genera, they did not compare smBGCs between taxonomically close Streptomyces species. Ziemert et al. reported the diversity and evolution of PKS and NRPS gene clusters within the genus Salinispora. In contrast to rare actinomycetes such as Salinispora, relationships between species and smBGCs are less well elucidated in the genus Streptomyces. Seipke et al. showed strain-level diversity of smBGCs in S. albus. However, the strains were actually not S. albus23 and may not belong to a single species but be divided into two independent genomospecies whose in silico DDH value is less 70% (our unpublished data). As the genus Streptomyces includes many species, accumulation of data for more Streptomyces species is needed to clarify whether smBGCs are diverse at strain-level or conserved at species-level. As reported here, genome sequence-based analysis will provide more insight into relationships between Streptomyces species and their secondary metabolites.

Methods

Strains

Streptomyces diastaticus subsp. ardesiacus NBRC 15402T, Streptomyces coelicoflavus NBRC 15399T, and Streptomyces rubrogriseus NBRC 15455T were obtained from the NBRC (Biological Resource Center, National Institute of Technology and Evaluation, Chiba, Japan) culture collection. Streptomyces sp. TP-A0882 has been deposited into the NBRC culture collection and registered as NBRC 11003012.

Analysis of 16S rRNA gene sequences

The 16S rRNA genes were amplified using two universal primers, 9F and 1541R, and sequenced according to an established method26. EzTaxon-e was used for basic local alignment search tool (BLAST) analysis of the sequences27.

Genome sequencing

Genomic DNA was prepared from each of the strains as described previously28. The prepared DNA was subjected to paired-end sequencing using the MiSeq sequencing system (Illumina, San Diego, CA, USA) as per the manufacturer’s instructions. The sequence redundancies for the three draft genomes were 74-128-fold. The sequence reads were assembled using Newbler v2.8 (454 Life Sciences, Branford, CT, USA) and subsequently finished using GenoFinisher29.

In silico DDH

DNA-DNA relatedness values were estimated from the genome sequences using Genome-to-Genome Distance Calculator (GGDC) 2.1, available from the Deutsche Sammlung von Mikroorganismen und Zellkulturen (DSMZ) website (http://ggdc.dsmz.de/distcalc2.php)30.

Analysis of NRPS and PKS gene clusters

Coding regions in the draft genome sequences were predicted using Prodigal v2.631. NRPS and PKS gene clusters were determined as previously reported9,10. A BLASTP search was performed using the NCBI Protein BLAST program (https://blast.ncbi.nlm.nih.gov/Blast.cgi?PAGE=Proteins), in which the non-redundant protein sequence (nr) database was chosen as the Search Set. AntiSMASH32 was used to predict substrates for adenylation, acyltransferase, and CoA ligase domains.

Analysis of the other secondary metabolite biosynthetic gene clusters

BGCs except for PKS and NRPS gene clusters in the draft genome sequences were searched using antiSMASH32.

Nucleotide accession numbers

The draft genome sequences in this study were deposited in GenBank/EMBL/DDBJ under the accession numbers shown in Table 1.