Linking species concepts to natural product discovery in the post-genomic era

Open Access

DOI: 10.1007/s10295-009-0683-z

Cite this article as:
Jensen, P.R. J Ind Microbiol Biotechnol (2010) 37: 219. doi:10.1007/s10295-009-0683-z


A widely accepted species concept for bacteria has yet to be established. As a result, species designations are inconsistently applied and tied to what can be considered arbitrary metrics. Increasing access to DNA sequence data and clear evidence that bacterial genomes are dynamic entities that include large numbers of horizontally acquired genes have added a new level of insight to the ongoing species concept debate. Despite uncertainties over how to apply species concepts to bacteria, there is clear evidence that sequence-based approaches can be used to resolve cohesive groups that maintain the properties of species. This cohesion is clearly evidenced in the genus Salinispora, where three species have been discerned despite very close relationships based on 16S rRNA sequence analysis. The major phenotypic differences among the three species are associated with secondary metabolite production, which occurs in species-specific patterns. These patterns are maintained on a global basis and provide evidence that secondary metabolites have important ecological functions. These patterns also suggest that an effective strategy for natural product discovery is to target the cultivation of new Salinispora taxa. Alternatively, bioinformatic analyses of biosynthetic genes provide opportunities to predict secondary metabolite novelty and reduce the redundant isolation of well-known metabolites. Although much remains to be learned about the evolutionary relationships among bacteria and how fundamental units of diversity can be resolved, genus and species descriptions remain the most effective method of scientific communication.


Species concepts Natural product discovery Taxonomy 


A fundamental goal of biology is to study the diversity of living organisms. This activity is broadly encompassed within the discipline of biological systematics, which includes developing methods to place organisms into taxonomic groups that share common features and evolutionary histories as well as describing new organisms that fall outside of circumscribed taxonomic boundaries. The foundations for these activities are rooted in Latin-based genus and species binomial nomenclature, which provides a common language for scientific communication.

Although identifying organisms to the species level has long been a fundamental goal of biology, developing broadly accepted species concepts that capture the complex evolutionary forces driving biological diversification remains problematic and controversial [2]. Prokaryotes present a particular challenge in that they reproduce largely by binary fission, and thus traditional biological species concepts are not readily applicable. In light of new information provided by DNA sequencing, there is an ongoing and high profile debate over how to recognize fundamental units of prokaryotic diversity that maintain the properties associated with species [1, 16, 24]. This debate is fueled by the observation that bacteria are adept at exchanging genes horizontally both among individuals in the same population and, to a lesser extent, between more distantly related taxa [33]. The process of horizontal gene transfer (HGT) can occur at homologous loci (homologous recombination) or can result in the introduction of new genes. Homologous recombination has a homogenizing effect on genetic diversification that can blur species boundaries and create uncertainty over taxonomic assignments when different genes within an individual strain maintain different evolutionary relationships. Although methods such as multilocus sequence typing can reveal these inconsistencies [34], concatenated trees are often strongly supported and produce relationships consistent with 16S rRNA phylogenies, suggesting that highly probable evolutionary relationships among closely related bacteria are resolvable even in the presence of HGT. Despite the uncertainties over how to incorporate HGT into an effective species concept for bacteria [9], well-chosen phylogenetic markers are rarely subject to this process and continue to be used to effectively identify cohesive clusters of bacteria that share evolutionary histories consistent with what would be expected for species-level taxa.

There are a number of reasons why it is important to describe new bacterial species even if current methods are flawed and revisions to existing taxa ultimately need to be made once more meaningful species concepts have been developed. One primary reason is that species descriptions require the designation of a type strain and its deposit into a publicly accessible culture collection. Also required is the deposit of relevant sequence data into a public database. Given that 16S rRNA gene sequencing has become the primary tool for initial taxonomic assignment, the importance of having type strain sequences available for comparative purposes can’t be overstated. These comparisons help to prevent taxonomic drift, or the spread of taxonomic boundaries across increasingly diverse lineages, as can happen when species names are sequentially assigned to query sequences based on top database matches as opposed to the closest type strain. The use of type strains for taxonomic assignments provides a more accurate assessment of the taxonomic novelty of the query strain, whereas the analysis of non-type strain sequences reveals a more complete picture of the distribution of related strains in nature and the diversity of the lineage to which the type strain belongs. Fortunately, the NCBI RefSeq Targeted Loci Project ( includes a curated database of prokaryotic type strain 16S rRNA sequences that, when completed, will facilitate future comparisons.

The application of DNA sequence-based approaches has provided unprecedented opportunities to assess bacterial diversity with a level of resolution that could not be obtained using traditional culture-based methods. Sequence data also provide new opportunities to re-formulate species concepts and to test past species descriptions for accuracy. The ability to sequence complete bacterial genomes has provided the most significant advance in this regard and at the same time has added a new and unexpected level of complexity to the interpretation of species concepts. This complexity is amply demonstrated by observations that even well-studied bacteria such as Escherichia coli can share as little as 39% total protein content among different strains [42]. However, the realization that bacterial genomes maintain a mosaic structure with large numbers of acquired genes may be more an indication of an effective method of adaptation than evidence that sequence-based phylogenies created using the appropriate markers cannot be used to re-create accurate evolutionary histories. Advances in genome sequencing include the genomic encyclopedia of Bacteria and Archaea (GEBA) project (, a collaborative effort between the Joint Genome Institute and the DSMZ, which aims to use the tree of life as a guide to select underrepresented taxa for sequencing. GEBA promises to add considerably to our understanding of species concepts and genome evolution by filling in major sequencing voids in the tree of life and by providing a vast amount of new information about the evolutionary history of individual genes and the various hosts in which they are observed.

HGT and secondary metabolism

The types of genes most susceptible to horizontal gene transfer are generally those associated with non-essential functions or adaptive traits. Included among these “auxiliary” genes are those involved in the biosynthesis of secondary metabolites [17]. Secondary metabolites are produced by large (in some cases exceeding 100 kb) gene collectives, which include genes encoding both biosynthetic and tailoring enzymes as well as mechanisms of resistance and transport. These biosynthetic pathways are responsible for the remarkable structural diversity observed among bacterial secondary metabolites [14], and their tight genetic clustering is readily amenable to HGT. There is ample evidence for the transfer of secondary metabolite biosynthetic genes among bacteria [28]. The horizontal exchange of these pathways provides a particularly rapid and effective method for individual cells to test new secondary metabolites for selective advantages as opposed to the more traditional concepts of mutation-driven evolution. The susceptibility of genes involved in secondary metabolism to HGT suggests they would not be linked to specific taxonomic groups, but instead should be strain-specific. Strain specificity creates a unique set of challenges for the effective discovery of new secondary metabolites as it necessitates the screening of a large number of strains, and a good dose of serendipity, to discover new products. It also suggests that secondary metabolite phenotypes are not reliable taxonomic markers.

Species concepts have not been at center stage in the search for new microbial secondary metabolites. Historically, the tendency to describe the producer of each new secondary metabolite as a new species led to an explosion of species descriptions and an over-classification of the genus Streptomyces [4]. Although revisions to this genus have since been made, it remains the most specious prokaryotic taxon and the source of nearly 50% of the biologically active bacterial secondary metabolites discovered from microbial sources as of 2002 [6]. Despite this remarkable productivity, it is predicted that only a small percentage of the total number of compounds that can be produced by this genus have been discovered [41]. These predictions are supported by genome analyses of even well-studied species [5] where it is clear that the products of the majority of the biosynthetic pathways maintained by individual strains have yet to be discovered [32].

The large number of Streptomyces species that have been discovered to date is linked to the pharmaceutical industries focused efforts on this taxon, which resulted in the isolation and screening of untold thousands of strains. In general, the consensus at the time when microbial drug discovery efforts were a high priority for the pharmaceutical industry was that secondary metabolite production was strain-specific [39], thus necessitating the incorporation of large numbers of strains into screening protocols. Strain specificity is what would be expected if biosynthetic pathways were being exchanged at random, and the products had little effect on the survival of the new host. Given the resolving power of the taxonomic tools available at the time, it can be understood why the perception of strain specificity may have been maintained, as many of what appeared to be the same Streptomyces species were undoubtedly found to produce different metabolites. It would be interesting to re-visit these strains using higher resolution molecular tools to document the extent to which pathways are maintained within closely related populations.

Diminishing returns on efforts to discover new secondary metabolites from soil streptomycetes contributed to a pharmaceutical industry-wide paradigm shift away from natural products towards other discovery platforms and the expansion of research programs to poorly studied environments in an effort to isolate new actinomycete taxa. The marine environment has become an important focal point in these efforts, and marine-derived actinomycetes are now recognized as an important source of new secondary metabolites [7, 11, 13, 26, 29]. Although only four new marine genera have been described to date [19, 30, 40, 43], there appears to be considerable diversity that is unique to the marine environment [18], especially when the phylum Actinobacteria is broadly considered [20, 36].

The genus Salinispora as a model for species concepts

Among the new actinomycetes that have been cultured from marine samples, the genus Salinispora has become an interesting model with which to address species concepts. To date, S. tropica and S. arenicola have been formally described [30], and the description of a third species (“S. pacifica”) is in preparation. These bacteria are widely distributed in marine sediments [21] and have also been reported from a marine sponge [23]. Salinispora spp. are a rich source of structurally unique secondary metabolites including salinosporamide A, which is currently in clinical trials for the treatment of cancer [12]. In addition to the small molecules themselves, the elucidation of the biosynthetic machinery involved in producing these unusual compounds has led to a number of interesting discoveries including a new chlorination mechanism [10] and a new series of extender units in polyketide biosynthesis [27].

The three Salinispora species are closely related, yet can be readily distinguished using all current metrics commonly applied to the delineation of bacterial species. The genus as a whole shares 99% 16S rRNA sequence identity, while the pair-wise species comparisons differ by as few as five nucleotides (Fig. 1). To date, no intraspecific 16S diversity has been detected for S. tropica, while only two single nucleotide changes have been observed in S. arenicola. Interestingly, both of these changes are restricted to populations recovered from the Sea of Cortex, providing the first evidence of biogeographical isolation (allopatric diversification) within a Salinispora species [21]. There is also evidence of geographical isolation between S. tropica (Bahamas only) and S. pacifica (yet to be recovered from the Atlantic). Although it is widely disseminated that the 16S rRNA gene evolves too slowly to be used for species-level determinations [37], this marker nonetheless has proven effective for the delineation of closely related species within the genus Salinispora.
Fig. 1

The three Salinispora species display little intra- (shown below species names) and inter-species 16S rRNA gene sequence diversity. dNTs = variable nucleotides

In contrast to the limited intraspecific Salinispora 16S sequence diversity that has been detected to date, most currently described bacterial species encompass considerably greater diversity. The inclusion of strains encompassing up to 3% 16S rRNA sequence divergence into a single bacterial species may in part be an artifact of trying to interpret sequence data in the context of past taxonomic assignments. This trend may have continued in part as a legacy of the report by Stackebrandt and Goebel [38], which stated that “species having 70% or greater DNA:DNA similarity usually have more than 97% 16S sequence identity.” Since a current requirement for the description of a bacterial species is that members generally share >70% DNA:DNA hybridization, it is possible that this statement has been interpreted to mean that 16S similarities as low as 97% can (or should) be used to circumscribe a bacterial species, although it is not clear that this was the authors' intent. In the case of some actinomycetes, it is now recognized that 16S values closer to 99% may be more realistic for species-level descriptions [31]. Although there is no reason to expect that one 16S value will be appropriate for all species-level units of diversity, or for that matter that this gene has sufficient resolution to delineate most species-level relationships, it is possible that many current species that encompass diverse 16S phylotypes may more appropriately represent genus level units of diversity or higher [25].

Despite the close phylogenetic relationships among the three Salinispora species, other metrics used to delineate bacterial species are also maintained when applied to the genus. In terms of DNA:DNA hybridization, the pair-wise comparisons of all three species yield values <70%. The average nucleotide identity (ANI) calculated from a comparison of 3,606 orthologous genes from the genome sequences of S. tropica (strain CNB-440) and S. arenicola (strain CNS-205) was 87% [35], which is well below the suggested cutoff value of 94% that was recently proposed to delineate genomic species [25]. Although these metrics are not based on ecological or evolutionary theory and therefore their use can be justifiably debated, they can be considered conservative, and in many cases will likely lead to the grouping together of lineages that should more accurately be considered independent.

In addition to being able to distinguish Salinispora species using sequence-based approaches, the three species can also be delineated by phenotypic traits. Surprisingly, the most dramatic of these is secondary metabolite production. Although species-specific secondary metabolite profiles have been reported for fungi [15], it is not clear that similar observations have been widely observed for actinomycetes. In the case of Salinispora species, the patterns are dramatic, with all S. arenicola strains examined to date producing compounds in the rifamycin and staurosporine classes, regardless of the global locations from which the strains were collected [22]. In the case of S. tropica, all strains produce compounds in the salinosporamide and sporolide classes, while none have been reported to produce the S. arenicola metabolites. Given that both of the classes of compounds observed in S. arenicola were originally reported from other bacteria and thus HGT can be inferred, these observations came as a surprise and contradict the concept that secondary metabolite production is strain specific. Clearly the fixation of specific pathways among globally distributed populations is powerful evidence of selection and implies that the products of these pathways have important ecological functions.

A comparison of the complete genome sequences of S. tropica and S. arenicola supports the species specificity of their core sets of secondary metabolites and provides new insight into the processes that drive diversification in the genus [35]. The most prominent feature emerging from the alignment of the two genome sequences is the concentration of species-specific genes in islands. Genomic islands are sites within which niche-specific genes are concentrated and ecological adaptations among closely related species resolved [8]. As might be expected, the biosynthetic genes linked to Salinispora species-specific secondary metabolite production are located on genomic islands. These biosynthetic operons are dynamic entities that hop among islands and display strong evidence of horizontal gene transfer suggesting that they are active sites of genome evolution and ecological adaptation [35]. Despite the relatively close relationship between the two Salinispora spp. for which genome sequences are available, the species specificity of certain pathways suggests that secondary metabolism can represent an important taxonomic feature for this group. Ongoing efforts by the J. Craig Venter Institute to sequence four additional Salinispora strains as well as large numbers of streptomycetes by the Broad Institute will provide a valuable new window within which to assess the distribution of biosynthetic pathways among actinomycetes.

Based on the results obtained to date from the three Salinispora species, it appears that a diversity-based approach targeting the cultivation of new actinomycete phylotypes provides an effective strategy for natural product discovery. However, there is no reason to expect that the compounds produced by a new phylotype will include new chemical structures as they may simply represent structures that are new relative to other closely related species. An alternative discovery approach is to evaluate the diversity of the biosynthetic genes maintained by individual strains prior to fermentation and chemical evaluation. Phylogenetic analyses of biosynthetic genes are providing new methods to predict secondary metabolite production and thus improve the discovery process. One of the best examples comes from analyses of ketosynthase domains derived from polyketide synthase genes. These domains tend to cluster together based on the compound produced [17], allowing sequence analyses of unknown strains to be used to predict the presence of specific biosynthetic pathways and their products (Gontang et al., submitted). Likewise, there is growing evidence that NRPS-derived condensation domains can be used for similar purposes. Thus, as more pathways are sequenced and experimentally characterized, it will become increasingly effective to use molecular screens to rapidly identify strains with the greatest genetic potential to produce new secondary metabolites or compounds within a targeted class of metabolites. This type of screening paradigm will allow more time to be spent varying fermentation conditions and monitoring pathway expression in a few high-quality strains as opposed to spending limited time on large numbers of strains that possess unknown biosynthetic potential.


Sequence-based approaches are providing new opportunities to assess bacterial diversity and resolve the relationships among groups of related strains that maintain the fundamental properties we expect from species-level taxa. Sequence-based approaches also have the capacity to provide an estimate of the genetic potential of strains to produce specific classes of secondary metabolites and to reveal the evolutionary relationships of these genes with respect to their hosts. Surprisingly, Salinispora species produce secondary metabolites in species-specific patterns. The ecological and evolutionary significance of this observation is not yet understood, nor is it known how broadly it may apply to other actinomycetes. If it is a general feature of these bacteria, it may be possible to develop a secondary metabolite-based chemotaxonomy for actinomycetes, as has been observed for fungi, plants and some marine invertebrates. However, the susceptibility of actinomycete biosynthetic pathways to horizontal gene transfer suggests there may be limits to this type of approach. The patterns of secondary metabolite production observed in the Salinispora example nonetheless suggest that the search for new actinomycete diversity represents a useful strategy for compound discovery, especially if this new diversity is derived from poorly explored environments where habitat-specific challenges to survival may select for the production of new metabolites. This strategy, coupled with the analysis of biosynthetic gene diversity, has the potential to reduce the redundant isolation of known compounds and allow more time to be devoted to those strains with the greatest potential to produce new secondary metabolites. What remains to be seen is which sequence-based approaches will ultimately prove most effective and become broadly employed by the natural products research community.

The field of microbiology is at a crossroads where the fundamental goal of assigning species names to bacteria is in danger of being abandoned. The growing acceptance of strain names instead of formal species descriptions portends a general lack of interest in taxonomy. If continued, this disinterest may leave large regions of taxonomic space filled with strain names but lacking the anchor points provided by formal species descriptions. Although efforts to create anchor points without formal species descriptions have been made [3], this is a precedent that will not necessarily be widely adapted and runs the risk of being applied without a standardized format. Assigning strain names is an easy solution to the growing problem that few are interested or being trained in bacterial systematics. It also avoids the very real problem that current methods to delineate species boundaries are insufficient and most likely inaccurate. A re-invigoration of the field in ways that make it less onerous to describe new species, possibly by the elimination of procedures that bog down the descriptive process in favor of increased reliance on methods that are generally available in most microbiology labs, may reduce this problem. At the same time, a working species concept that can be broadly applied to bacteria is needed. If these challenges can be met, there is reason to believe that the vast majority of bacterial diversity, possibly including that which has yet to be cultured, could be described using genus and species nomenclature over the next few generations. This is a goal that should be readily embraced as opposed to settling for strain names that lack taxonomic precision and will create communication challenges that will haunt future generations of microbiologists.


This work was supported by the California Sea Grant Program (R/NMP-98), NOAA Grant NAO80AR4170669 and the National Institutes of Health Grant GM085770.

Open Access

This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.

Copyright information

© The Author(s) 2009

Authors and Affiliations

  1. 1.Scripps Institution of OceanographyUniversity of California San DiegoSan DiegoUSA

Personalised recommendations