The Interplay of Homologous Recombination and Horizontal Gene Transfer in Bacterial Speciation
Bacteria experience recombination in two ways. In the context of the Biological Species concept, allelic exchange purges genic variability within bacterial populations as gene exchange mediates selective sweeps. In contrast, horizontal gene transfer (HGT) increases the size of the population’s pan-genome by providing an influx of novel genetic material. Here we discuss the interplay of these two processes, with an emphasis on how they allow for the maintenance of genotypically cohesive bacterial populations, yet allow for the separation of these populations upon bacterial speciation. In populations that maintain genotypic similarity by frequent allelic exchange, horizontally transferred genes may initiate ecological barriers to genetic exchange. The resulting recombination interference allows for the accumulation of neutral mutations and, consequently, the imposition of a pre-mating barrier to gene transfer.
KeywordsRecombination speciation periodic selection recombination interference horizontal gene transfer cohesion species concepts
The identification of organisms is a fundamental step in deciphering that organism’s biology; the power of classification is the implicit understanding of what that organism is likely to do, or capable of doing, based on the past experiences with similar organisms. The species is one of the most fundamental and recognizable units of biological organization. The term is used by both biologists and non-biologists in its original, Aristotle-born meaning to encompass a group of individuals – organisms, object, or thoughts – which share commonality as well as properties that distinguish members of one species from those of other species. In his Scala Naturae-based treatise on animal classification, Aristotle identified almost all of the extant species of mammals in Europe using only this general principle as a guide (1), easily distinguishing between related species (e.g., three kinds of herons) while ignoring variability among individuals of the same species. While Linnaeus offered a more complex, hierarchical classification in Systema Naturae – placing species as the most fundamental unit of taxonomy below Genus, Family, Order, Class, Phylum, and Kingdom (2) – the basis for assigning organisms to species was not substantially more sophisticated than that used by Aristotle. Arguably, most schoolchildren can classify oft-encountered organisms into species with relative ease, identifying similar organisms as members of named groups and recognizing differences between groups. Indeed, children as young as four months of age are able to categorize animals such as cats and dogs in a way that distinguishes between the groups even as they recognize different individuals within each group (3). So well-entrenched is the “common-sense” perception of species boundaries that the United States’ Endangered Species Act of 1973 includes a definition of endangered species that fails to delineate or define a species itself (4). In the court of public opinion, the term “species” lies in that nebulous group of words whose definitions may be difficult to state precisely, but which can be recognized with ease: the “I know it when I see it” phenomenon.
This has been an unsatisfying state for biologists ever since Darwin rephrased the study of biological diversity as the origination of new species from old ones (5). The species played a central role in his view of life: members of the same species produced offspring that competed with one another, with only the fittest surviving. The two fundamental properties of species are evident here. First, there are genetic and ecological commonalities in members of a species in sharing the ability to reproduce with one another, as well as to compete with one another for resources. Second, there is distinction between different species, where members of different species could stably co-exist due to the lack of niche overlap. Beyond this, Darwin introduced a third concept: different species are not only distinguishable from one another but at least one must also be distinct from their “parent” species. Here, speciation is the evolutionary process by which species originate and become distinct from their ancestor. Also explicit in his model is the idea that each species originates from single ancestral species, so that descendents may be organized into hierarchical groups-within-groups, just as Linnaeus had done. Here, we examine how bacteria may be classified into species, and how HGT impacts this process.
2.1 Why Species Matter
Beyond the purely scientific interest in defining natural groups that represent organisms with shared evolutionary trajectories, species’ names impact the non-scientific world in many critical ways. Because members of a species share properties and behaviors, we rely upon their proper identification to generate an appropriate response. Nowhere, perhaps, is this more true than with microbial species. The response to finding spores of Bacillus subtilis versus those of Bacillus anthracis (the causative agent of anthrax) would be very different. Identification of bacterial species plays a central role in medical diagnosis, food safety, public health, biotechnology, and response to bioterrorism. Thus species delineation has a practical use, providing professional microbiologists with a common language to discuss the biology of important groups of microorganisms.
2.2 Species Definitions and Species Concepts
There are two approaches to delineating the boundaries of microbial species. First, one may define a species in any way deemed appropriate so that it encompasses strains that share phenotypes of interest. This is often the tactic taken in delineating pathogenic species: strains of a particular species are the causative agents of particular diseases. Such a description is often called a species definition. Species definitions may use broadly applied methods, such as defining bacterial species to encompass strains which share >60% similarity based on DNA hybridization, or which share >97% identity at their 16S rRNA genes (6). These methods are useful for assigning names to bacteria, but the similarity of the organisms grouped this way may only passively reflect their common ancestry.
Alternatively, a species may represent a group of strains that share common features due to some active underlying biological process. These descriptions are termed species concepts, because it is the process behind the classification that is important. Here, the similarity among organisms within a species reflects a process that acts to maintain that similarity; it is not merely the reflection of common ancestry. In this way, species – as delineated by a species concept, not a species definition – are more meaningful groups than are genera, families, orders, classes, or phyla, which owe the similarity of their members to ancestry alone. As a result, the inclusion or exclusion of taxa from higher taxonomic groups is arbitrary, since no active biological process maintains the similarity among members of the group.
The biological literature is rife with the discussion of how to identify, classify, or delineate species, and what factors influence their origins. All are directed at solving the same principles outlined by Darwin: (a) how and why do members of the same species share similarity, (b) how and why do they maintain distinctiveness relative to related species and (c) how did they come into being? One biological process which promotes genotypic similarity among species members is gene exchange. Not surprisingly, homologous recombination plays a central role in many concepts of species and speciation. Mayr proposed that species’ members share a common gene pool, and that high-frequency gene exchange among groups of con-specific individuals is what provided genotypic (and thus phenotypic) cohesion within species (7,8). The process of speciation, then, would amount to the imposition of barriers which prevented facile gene exchange. Paterson similarly proposed that a shared mate-recognition system would lead to genotypic cohesion (9); here, speciation would entail development of different mate-recognition systems, rather than any direct barrier to gene exchange. In either case, exchange of genes among con-specific individuals works to prevent the gradual accumulation of differences that result from ongoing mutation.
Other, more broadly defined species concepts do not invoke recombination per se, but recognize the intrinsic similarly of species members. Van Valen proposed that members of the same species share an ecological niche (10), necessitating a phenotypic cohesion that likely stems from genotypic similarity. Wiley proposed that members of a species share the same evolutionary trajectory (11), thus also reflecting phenotypic (and thus genotypic) similarity. Most broadly, Templeton argued that species members share cohesion mechanisms, regardless of what they may be (12). Below, we see that all of these ideas have bearing on the identification and delineation of groups of bacteria that share genotypic and phenotypic similarity imparted by cohesion mechanisms, and have shared ecologies and evolutionary trajectories.
3 Cohesion in Bacterial Species
3.1 Genotypic Cohesion Can Be Imparted by Periodic Selection
In bacteria, genotypic differences between strains arise every cell division as inevitable mutations arise. These differences are non-randomly distributed among strains; that is, there appears to be several mechanisms through which similarity is maintained among a larger group of strains than one would expect at random. One mechanism is periodic selection.
In many ways, ecotypes have properties that are associated with species. Similarity is maintained among individuals in a population by an active process, groups are clearly differentiated from one another by ecological distinctiveness, and there is a mechanism (fixation of beneficial mutations) that can lead to lineage separation. Thus ecotypes could be considered one of the most fundamental units of organization of bacterial strains. But from a conceptual standpoint, ecotype boundaries can only be established by elucidating the ecological niche of a bacterial strain. This may not be feasible, as it requires the assessment of the limits of “ecologically neutral” genetic changes. That is, one must determine which differences among bacteria are sufficiently large to place them in different niches, and therefore be different ecotypes. However, the scope of an ecotype – that is, the boundaries of the population encompassed by a periodic selection event – is a function of the nature of the beneficial mutation driving periodic selection. Mutations of small benefit would define a narrow ecotype, whereas those with greater benefits would purge variability among a group of more diverse strains.
From a practical standpoint, it is also unlikely that ecotypes will replace current species standards. First, it has been estimated that many named bacterial species – like Escherichia coli – contain hundreds or thousands of ecotypes. The utility of bacterial species definitions in medical diagnosis, food safety, and bioterrorism is the rapid assessment of the biological properties of a strain based on crude estimates of its relatedness to other strains. For now, the properties of many bacteria that garner the strongest interest encompass far more strains that are found in an ecotype. Partly, this is because beneficial alleles that arise within an ecotype may also spread to a much larger, and more diverse, set of strains via homologous recombination. Second, objectively assessing “ecologically neutral” differences is impossible when considering sequence differences alone. As a result, ecotypes with particularly interesting phenotypes – e.g., Mycobacterium tuberculosis or Bacillus anthracis – may be classified as species (6).
3.2 Genotypic Cohesion Can Be Imparted by Homologous Recombination
3.2.1 Bacteria Exchange Genes
Recombination in bacteria does not involve the exchange of haploid genomes, but rather the unidirectional transfer of small fragments of DNA between donor and recipient (Fig. 3.1 B). Here, DNA may be moved between cells by one of three mechanisms (17). Transduction occurs when bacteriophages mistakenly package bacterial DNA into their capsids instead of virus DNA. When this particle finds a target cell, the DNA – limited in size to a fragment that will fit in the capsid – is injected. Transformation occurs when a bacterial cell imports fragment of naked DNA directly from the environment; this is common among bacteria that consume DNA as a source of food. Conjugation occurs when a plasmid that has integrated into the bacterial chromosome begins its process of replication and transfers into another host. Plasmid DNA is transferred directly between the cytoplasm of the donor cell into the cytoplasm of the recipient, thus requiring prolonged cell–cell contact. Conjugation can move large portions of chromosomal DNA between cells.
After the DNA has been injected into the cytoplasm of the recipient cell, it is subjected to five important processes. First, restriction endonucleases will cleave almost all incoming DNA fragments, with the exception of DNA arriving from a cell expressing the same hsd-encoded restriction/modification system, whose cleavage sites have thus been protected. Given the variability in hsd genes within and among bacterial species (18,19,20), this exception is rare, even within named species. Second, exonucleases will degrade the double-stranded (ds) DNA ends of the resulting fragments. These two processes act in concert to reduce the size of incoming DNA fragments and, most often, prevent the DNA from integrating into the recipient chromosome. Third, RecA-mediated homologous recombination may occur, whereby the incoming DNA fragment – reduced in size through the action of nucleases (21) – is integrated into the chromosome, replacing the resident allele at its cognate position. This requires nucleotide sequence identity between incoming and resident DNA, and the occurrence of mismatches reduces the probability of successful recombination (22). Fourth, if no region of similarity exists between the incoming and the resident DNA, illegitimate recombination may occur, placing the arriving DNA anywhere in the chromosome or, alternatively, site-specific recombinases (e.g., phage integrases) may catalyze recombination into specific locations. Lastly, persistence of any newly acquired genes rests on the interplay of stochastic processes (which may lead to the loss of genes that are potentially advantageous) and natural selection which, as is discussed below, is the final arbiter of gene fate in bacterial genomes.
3.2.2 Two Kinds of Gene Exchange
When viewed this way, HGT encompasses two distinct processes. First, genes may be added to a recipient genome after being transferred from a potentially distantly related taxa. This process is often termed horizontal (or lateral) gene transfer, and it is this process that is the topic of this book. Second, alleles may be exchanged between closely related taxa, resulting in gene conversion. This process is often called “recombination,” alluding both to the role of the homologous recombination machinery in catalyzing allele replacement and to the population genetic process of reducing linkage disequilibrium (23). The efficiency of the homologous recombination machinery will catalyze gene conversion events almost to the exclusion of illegitimate recombination events when the incoming DNA is highly similar to resident genes. Often the two processes are studied separately: HGT is viewed as occurring between different species, and recombination occurs primarily within species. Here, we argue that they are intimately associated, with each process affecting the scope and impact of the other; it is this association that also affects the process of lineage diversification – speciation – in bacteria.
3.3 Gene Transfer Among Related Organisms: A Bacterial Species Concept
Dykhuizen and Green also invoked homologous recombination in their encapsulation of Mayr’s biological species concept (23). Their model is a retrospective one, using the patterns of genetic diversity among individuals to delineate species boundaries. The proposal was that the relationships among individuals as inferred from different genes would not be congruent within a species, but they would be congruent between them. Thus the Dykhuizen and Green model directly invokes homologous recombination as a cohesion mechanism within bacterial species: strains of the same species exchange genes, resulting in incongruent phylogenies among different genes.
This model works well when applied to some groups of bacteria. For example, different genes among different strains of the enteric bacteria E. coli or Salmonella enterica show different relationships, reflecting homologous recombination within these groups (23,24). Yet different genes among different enteric bacteria show congruent relationships, implying that homologous recombination does not readily exchange genes across the boundaries of these named species (25); see also Chapter 21. When many strains within named species are characterized by Multi-Locus Sequence Typing (MLST) – wherein alleles at a handful of shared loci are genotyped (26) – it is clear that many bacterial species have appreciable rates of homologous recombination among constituent strains (27,28,29,30,31).
The contrasts between groups of organisms encompassed by homologous recombination and those delineated by periodic selection (the ecotype model) are clear. Ecotypes encompass strains with sufficiently similar ecologies that periodic selection events may purge all of the variability when beneficial mutations sweep the population, with the entire bacterial chromosome hitchhiking along. Yet this beneficial allele may also rise to high frequency among a much larger, and more diverse, set of strains via homologous recombination (32,33). Recombination among strains also works against the gradual divergence between strains that accompanies the ongoing accumulation of mutations. Here, genotypic cohesion is imparted among a much larger group of strains than a single ecotype, and their ecologies and evolutionary trajectories are closely tied to one another. The relationships among genomes at this scale are no longer dictated by lineage-specific mutations, but are instead shaped by gene exchange among strains. Thus the species concept outlined by Dykhuizen and Green resembles that defined by Mayr.
Yet while applicable to some groups of bacteria, this model fails for others such as species of the genus Neisseria. Here, homologous recombination allows gene exchange between named species at some loci, thereby increasing genotypic diversity within species rather than reducing it. As will be discussed further below, these cases represent points along the path establishing genetic isolation between species.
4 Speciation in Bacteria
4.1 The Barrier to Gene Exchange Between Species
Analysis of MLST data has revealed that natural populations of bacteria do generally fall into distinct sequence clusters that reflect commonly recognized microbial species, even when those species are closely related and highly recombinogenic (34, 35). If recombination acts to exchange alleles within a bacterial species, what prevents transfer across species boundaries? There are three potential boundaries.
First, gene exchange may be blocked before DNA even enters the cytoplasm, thereby producing a speciation mechanism comparable to classical (whole genome) pre-mating isolation. For example, bacteriophage host ranges may be limited to closely related strains, thus mediating transduction only within species. This does not appear to be the case, as those bacteriophages assayed show variable host ranges, many of which include many named bacterial species. More importantly, bacterial species may be infected by numerous bacteriophages, each with a different host range. For example, E. coli is infected both by bacteriophage lambda, which has difficulty infecting other enteric bacteria due to differences in the LamB receptor protein, and bacteriophage P1, which infects many enteric bacteria. Indeed, genes encoding the P1 tail-fiber proteins have been used to create vectors for mutagenesis across numerous enteric bacterial species (36). Barriers to recombination may be supplied by geographical barriers among sexual eukaryotes; if gene exchange is tied to reproduction, then the inability to find mates would curtail recombination and thus lead to allopatric speciation. But gene transfer in bacteria does not require that the donor and recipient be in the same place at the same time. Because bacteriophages can travel large distances and shelter donor DNA for long periods of time, geographic barriers – while clearly slowing down migration and/or recombination in prokaryotes (37, 38, 39) – is not an absolute isolating mechanism for most bacteria (excepting obligate intracellular symbionts like Buchnera or any taxon where vicariance in the host population represent ecological barriers to associated bacterial species).
Second, DNA moved into a recipient cell’s cytoplasm may fail to undergo recombination as it has accumulated too many differences. Here, the mismatch-repair system recognizes the duplex between resident DNA and incoming DNA and prevents successful exchange. It has been shown in several bacteria that the efficiency of homologous recombination decreases exponentially with linear decreases in sequence identity (40,41,42,43,44), and it is effectivelyimpossible (except for very short blocks of sequence) when nucleotide sequences are greater than 5% different (45). Thus, DNA is not efficiently exchanged between cells whose DNA has become too different. However, the occasional strain with a defective mismatch-repair system may have alleles converted by DNA with much lower sequence identity (40); after recovery of the mismatch-repair system (46), the recombinant and donor lineages may exhibit facile recombination at the converted locus, even as other loci of the recombinant lineage lack sufficient sequence identity to recombine with the donor lineage (47). This type of locus-specific control over recombination may account for the “fuzzy species” boundaries inferred from MLST studies (34).
Third, DNA may be exchanged, but the resulting recombinant may be counterselected if it is less fit than its two parents (although, in practical terms, it would likely only compete with its nearly isogenic maternal parent). This is comparable to post-mating isolation mechanisms in eukaryotes, while the previously mentioned mechanisms can be classified as pre-mating due to the fact that no recombinant organism is ever produced. In situations where the above pre-mating mechanisms do not reduce recombination to low rates, we believe that counterselection of recombinants will play a decisive role in establishing species boundaries by eliminating recombinants. This is discussed further below.
4.2 The Mutation–Recombination Balance
For recombination to act as a cohesion mechanism for bacterial species, it must purge the variability – on a locus by locus basis – faster than it accumulates by mutational processes (Fig. 3.2). If the rate of recombination is low, strains will diverge more quickly than recombination can make them similar; eventually, the DNA will become sufficiently different that homologous recombination cannot exchange genes between the separate lineages (48,49). Thus, “speciation” will have occurred as the inevitable result of mutation. If recombination merely deters the divergence of strains by mutation, then groups of strains that are similar via recombination are transient and, in some ways, artifactual. If speciation is the inevitable result of stochastic mutation, then there are no shared ecological properties of the individual species; species are merely groups of strains that have yet to experience sufficient numbers of mutations to prevent recombination. This is not a satisfying way to delineate species, for the selection of strains to include in a species would be arbitrary.
4.3 Ecological Barriers to Recombination
This barrier – termed recombination interference – is outlined in Fig. 3.3; the orthologous DNA is sufficiently similar such that RecA-mediated recombination is not inhibited if strand invasion occurs. Here, we have indicated an addition of a small, adaptive gene cluster to one of the two genomes. There are several ways recombination may be inhibited in the vicinity of adaptive loci. First, recombination events that eliminate these genes would be counterselected because the recipient would lose important genes (Fig. 3.3, class A). Second, recombination events that move these genes to a naive host may, in fact, be detrimental (class B). For example, the cadA and ompT loci are present in E. coli, but not in the highly similar strains of Shigella (54, 55). Recombinants moving Shigella DNA lacking these genes into its cognate position in an E. coli strain would eliminate these genes, making a less-fit recombinant. It is also known that introduction of these genes from E. coli to Shigella interferes with virulence (56, 57). Thus, recombinants in either direction may be counterselected. Third, from a purely mechanistic standpoint, fragments with dsDNA-ends within genes found only in the donor will not be effective substrates in the recipient, further reducing recombination at the shared genes flanking this adaptive locus (class C).
This post-mating barrier will eventually result in a pre-mating barrier (58). The lack of recombination resulting from the lack of fit recombinants allows for the accumulation of neutral mutations in the region surrounding adaptive loci. Therefore, the genomes will become gradually more divergent in the genes immediately adjacent to adaptive loci because variance-purging selective sweeps cannot occur at those loci. Eventually, sufficient numbers of differences will accumulate, and DNA heteroduplexes between the two versions will be rejected by the mismatch-correction apparatus. In the end, the ecological differences that produce post-mating barriers result in robust pre-mating barriers. Given the infrequency of recombination in bacteria relative to eukaryotes, it is unlikely that pre-mating barriers would result from selection to prevent the formation of less-fit progeny.
It is clear that species of the genus Neisseria represent groups at the midpoint of this transition. Different species have sufficiently different ecologies that they can be readily classified as different species, yet there is also recombination across species boundaries at some loci, those that do not affect each population’s niche specificity (34). Given time, these species will likely develop pre-mating barriers, and the recombination across species boundaries will not necessarily lead to lineage coalescence. That is, we expect emerging species to continue to exchange genes across species boundaries so long as recombinants are not unfit. Alternatively, recombination across species boundaries may result in the coalescence of two emerging species, reuniting their gene pools (59). This is occurring in two Campylobacter species, possibly as the result of expansion of the two populations into a shared ecological niche. Thus erasure of ecological differences between the two lineages leads naturally to the merger of the once diverging lineages (59).
4.4 Species in Pieces
Two questions arise from this fragmented process of speciation. First, what is the time frame over which complete genetic isolation occurs? Traditionally, the time over which lineage separation occurs is thought to be relatively short compared to the time between successive divergence events. That is, in visualizing speciation as a series of branches on a phylogenetic tree, the variance of each node is considered to be small relative to the inter-node distance. Given that a process relying on ecological barriers entails both the accumulation of adaptive differences at numerous chromosomal loci and the time to accumulate mutational differences at shared loci, this is not an insignificant time in bacteria. Second, the Dykhuizen and Green species concept groups strains with incongruent phylogenies among genes into the same species; members of different species would have congruent phylogenies because recombination between them is absent. Yet consider genomes midway through in the process outlined in Fig. 3.4. Here, the genomes are exchanging genes at some loci but not at others. If many such lineages were diverging simultaneously from a single parent populations, then the phylogenies of some genes may be congruent – thereby placing them in different species – where the phylogenies of others may be incongruent due to continued recombination at those loci. Thus, such lineages would be considered different species at some genes but the same species at other genes. If the process of lineage separation is rapid, then this is not an issue. But if the process takes significant amounts of time, this confounds our ability to separate strains neatly into distinct, non-overlapping species.
4.4.1 Time Frame for Lineage Divergence
To elucidate the time frame over which gene divergence takes place, one needs to calculate the time of divergence for all orthologous genes in two populations that no longer experience any recombination. One can then determine if the time frame which encompasses 95% of the divergence times (intra-node variance) is very small relative to the average time between genome divergence events (inter-node length). The average time of divergence is often inferred from the average nucleotide divergence at synonymous sites or divergence at 16S rRNA-encoding genes; rates of change at these sites have been calibrated to historical events in the fossil record (60,61). The difficult task, then, is to calculate the time of divergence for individual genes relative to the average. Nucleotide divergence is a product of the rate of divergence and the time of separation: divergence increases both with increased rate of change and with increased time of separation. Variation in nucleotide divergence among genes has traditionally been attributed to two sources: variation in rate, and error. Yet variation in time may also be a contributor if the process of speciation takes a long period of time. If divergence is corrected for variation in rate, a component of the residual variance in divergence may be attributed to variation in divergence time.
4.4.2 Models for Lineage Separation
Regardless of the model for lineage separation, the very nature of recombination in bacteria predicts that different regions of orthologous bacterial chromosomes would show different times to the coalescent. Because gene exchange involves only a fragment of the chromosome, it cannot be true that all genes share the same time of divergence. But the data described above do not discriminate between the two models discussed above. That is, even if homologous recombination were too infrequent to prevent the inevitable accumulation of mutations, different genes would still diverge at different times. Therefore, merely demonstrating that different regions of the chromosomes diverged at different times does not establish that homologous recombination acts as a cohesion mechanism for bacterial species. This is shown in Fig. 3.7 A as the “rapid divergence” model. Here, the low rate of recombination means that few, if any, recombination events occur after lineage-specific adaptations are acquired. As a result, lineage-specific genes would be found inserted at random with respect to the age-structure of shared genes. On the other hand, if the rate of recombination is high, it would continue at loci unlinked to lineage-specific adaptations. Here, lineage-specific genes would lie primarily in the “older” regions of the genome, because regions unlinked to the insertions would continue to experience homologous recombination after acquisition (Fig. 3.7 B). Therefore, these two models can be discriminated by examining the age distribution of different regions of the chromosome with respect to the loci that bear adaptive changes between two lineages.
4.4.3 Species-Specific Genes Lie in Older Regions of the Chromosome
Adaptive changes may result from many processes, including mutation of existing genes, gene loss, and gene gain. Because the vast majority of single-base substitutions are neutral, it is very difficult to locate adaptive changes from sequence data alone. Yet gene gain and loss events are likely not neutral. Gene gain by HGT is an especially powerful mechanism whereby strains may explore novel niches using genes whose functions have been honed by selection in another organism (68, 69). Therefore, we focus on the changes in gene inventory as indicators of potentially adaptive mutational events.
The results are shown in Fig. 3.7 C. Here, the relative divergence of orthologous genes found in both the E. coli and the Salmonella genomes are summarized as several classes. In some, there is a gene interspersed or lost within either the E. coli or the Salmonella lineage so that genes which are adjacent in all strains of one species (genes AB) are interrupted by a novel locus in the other (genes AXB). We term these genes – genes AB in this example – as flanking species-specific genes. Another class of gene pairs includes those that are similarly interrupted, but in only one or a few strains in each species; these may be considered as flanking strain-specific genes, or those where the differences were acquired more recently. As shown in Fig. 3.7 C, genes flanking species-specific insertions appear in a context of orthologous genes, which are more divergent than expected (66). These data suggest that recombination has proceeded between the nascent E. coli and Salmonella lineages after the acquisition of these loci.
4.5 The Muddling of Bacterial Species
The protracted period of time over which lineage differentiation occurs confounds our ability to use domains of recombination to delineate species as Dykhuizen and Green proposed. It is clear that gene exchange does not occur between strains of E. coli and strains of Salmonella enterica, defining them as different species. They are also phenotypically quite distinct from each other. In addition, gene exchange does occur among strains of E. coli and among strains of S. enterica, thereby placing them into the same species by Dykhuizen and Green’s criteria. But it is not clear that either group represent a genotypically or phenotypically cohesive group. Given the time taken to separate those lineages, it is clear that neither E. coli nor S. enterica represents a population of panmictic bacteria where recombining strains have an equal likelihood of exchanging genes at any locus. For example, there are commensal, uropathogenic, and enterohemorrhagic strains of E. coli, where each type has a distinct ecology and gene inventory (70, 71). While recombination would be suppressed adjacent to niche-defining loci, it would proceed at loci unlinked to niche-specific genes. More importantly, regions of free recombination would vary depending on which sets of strains were being compared. Therefore, while phylogenies of different E. coli genes are not congruent, thereby placing them in the same species sensu Dykhuizen and Green, the strains harboring those genes represent a mosaic of partially isolated groups with a reticulate pattern of gene exchange among them.
Thus, groups of bacteria defined by recombination have muddled, indistinct boundaries; barriers to recombination can only be identified long after recombination has ceased, thereby allowing diagnostic substitutions to accumulate, differentiating the orthologous genes. This problem is shared with ecotypes, which have indistinct boundaries due to the variable magnitude of the beneficial mutations, which then drive periodic selection events to variable taxonomic breadths. This situation is inevitable given the nature of bacterial recombination, which occurs on a gene-by-gene basis. Sexual eukaryotes may be described by powerful species concepts that neatly segregate these organisms into clearly defined groups. This separation is only possible because genetic isolation may occur simultaneously for all genes in the genomes. This is simply not feasible in free-living bacteria; because genetic isolation in bacteria is imposed over a long period of time, boundaries between groups are necessarily “fuzzy.” The biology of microorganisms precludes the formation of a robust species concept to delineate robustly distinct groups of bacteria.
5 The Interplay Between Homologous Recombination and HGT
5.1 Recombination Within Species Increases the Probability of Successful HGT Between Species
Just as genes introduced by HGT affect the rate of homologous recombination between incipient species, homologous recombination within species can influence the probability of successful gene acquisition. This occurs for two reasons. First, there is the hedge against stochastic loss. After gene introgression, its frequency in the population is 1/N; because bacterial populations are so large this initial frequency is quite small, so that the inevitable result of most gene acquisition events is rather rapid loss, even if beneficial functions are conferred. Homologous recombination acts to increase the frequency of beneficial acquired genes, thus decreasing the probability of stochastic loss. More importantly, homologous recombination places the HGT genes in different genomic contexts. Given the variability of gene inventories among different strains of the same species (70), it is very unlikely that the strain into which the genes were originally inserted is the one which could maximally benefit from the new genes’ functions. The success of a gene acquisition event is a function of the overall benefit conferred by the new genes, and that benefit may well be higher in another context. For example, genes allowing better scavenging of intracellular iron within eukaryotic cells would benefit pathogenic E. coli O157:H7 to a greater degree than commensal strains. Therefore, intraspecific gene transfer exposes a variety of strains to the new genes’ functions, thus increasing the probability of long-term retention.
5.2 HGT Promotes Homologous Recombination Within Species
5.3 Does HGT Invalidate the Concept of a Bacterial Species?
Given that bacteria exchange DNA across species boundaries by HGT, what does the species boundary represent? The genomes of bacteria from many lineages show evidence for large numbers of recently acquired genes (17,72,73). Aside from confounding bacterial phylogeny (74) – to the extent a phylogeny based on a small minority of constituent (“core”) genes can be used to infer relationships (75) – the transfer of genes across species boundaries calls into question the utility of the species concept itself. If bacteria exchange genes across species boundaries, what does such a boundary define? Mayr’s original concept, from which the Dykhuizen and Green formulation is derived, places high value in lack of gene exchange across species boundaries. We argue here that, from a practical standpoint, the transfer of DNA between species does not invalidate the utility of species definitions, nor does it interfere with a species concept.
Species definitions play an important role in microbiology in allowing the robust and consistent identification of bacteria, which play important medical, epidemiological, economic, biotechnological, or bioweapons roles. The point emphasized in this chapter is that the exchange of traits among con-specific strains provides a level of genotypic cohesion that is useful in this context. Although strains within a species may vary in phenotype, they share a core set of traits that encapsulates functionally germane behaviors. The introduction of novel traits into a species is simply a source of phenotypic variation between strains. The similarity between donor and recipient clades imparted by such transfers is minor, and does serve to confound the delineation of functionally important groups. For example, Escherichia coli received phosphonate utilization genes from an outside donor, likely an alpha-proteobacterium. But these genes alone do not confound the distinction between E. coli and Sinorhizobium meliloti. Lateral gene transfer does raise problems when attempting to classify organisms into strictly hierarchical schemes (76), but this does not impact the delineation of the species itself when multiple traits are used for species discrimination. Moreover, lateral genes transfer is infrequent relative to the rate of homologous DNA exchange within species. Therefore, it does not confound the delineation of species as a group of strains that share genotypic cohesion as the result of allelic exchange.
5.4 Bacteria Without a Species Concept
From a practical standpoint, all organisms must be given names so that they can be identified and discussed. As discussed above, these names carry connotations and implications when applied to bacteria, because named organisms play important roles in disease, food safety, biowarfare, and other public health arenas. What is problematic for microbiologists is that the name of a bacterium serves a dual role as that organism’s identifier and as its species name. That is, the simple act of assigning a name to a bacterium places it into a taxonomic group, which is useful when associated with a biological process underlying the similarity of its members. Above, we have argued that domains of homologous recombination work well in delineating some bacterial species. Yet one species concept alone is clearly insufficient to place all bacterial strains into well-ordered species. When the rate of recombination within a bacterial population is too low to prevent the inevitable diversification of strains, no cohesive group can be delineated. Any collection placing such diverse strains into a “species” is arbitrary; while this is a valid – and even useful – species definition, it is not a species concept. A group of more closely related strains – such as those formed by periodic selection events (Fig. 3.1) – would represent a group whose genotypic cohesion represents a biological process.
Disturbingly, it is quite possible that many bacteria cannot be placed into any group that can be delineated with a valid species concept. That is, they may not belong to a group whose genotypic similarity is the result of a cohesion mechanism such as shared domains of gene exchange, or shared ecological niches that are subject to purifying selective sweeps. If so, then they would not belong to any species that is based on a species concept. It is unfortunate that naming conventions in bacteria demand assigning these organisms to named taxonomic groups even if there are no non-arbitrary criteria for defining their boundaries.
6 Conclusions and Outlook
Rates of gene exchange between a bacterium and close relatives (recombination) are intimately associated with its rate of gene acquisition from distant relatives (HGT), with each process affecting the other. This interplay sheds light on the process of lineage separation, or the creation of new bacterial species. In the model presented above, we argue that the protracted and complex process of disentangling domains of homologous gene exchange prevents the clean segregation of bacterial strains into species by virtue of their shared gene pools.
- 1.Aristotle (1910) Historia Animalium (translated by D’Arcy Wentworth Thompson), Clarendon Press, Oxford.Google Scholar
- 2.Linnaeus, C. (1758) Systema naturae per regna tria naturae, secundum classes, ordines, genera, species, cum characteribus, differentiis, synonymis, locis, Holmiae.Google Scholar
- 3.Quinn, P. C. (2002) Young infants’ categorization of humans versus nonhuman animals: roles for knowledge access and perceptual process, in Building Object Categories in Developmental Time (Lisa Gershkoff-Stowe, D. H. R., ed.) Lawrence Erlbaum Associates, Mahwah, NJ.Google Scholar
- 4.Senate and House of Representatives of the United States of America (1973) Endangered Species Act of 1973. In. (Agency, E. P., ed.) Government of the United States of America Place.Google Scholar
- 5.Darwin, C. (1859) On the Origin of Species by Means of Natural Selection or the Preservation of Favoured Races in the Struggle for Life, John Murray, London.Google Scholar
- 7.Mayr, E. (1942) Systematics and the Origin of Species, Columbia University Press, New York.Google Scholar
- 8.Mayr, E. (1963) Animal Species and Evolution, Harvard University Press, Cambridge.Google Scholar
- 9.Paterson, H. E. H. (1985) The recognition concept of species, in Species and Speciation (Vrba, E. S., ed.) Transvaal Museum, Pretoria, 21–9.Google Scholar
- 12.Templeton, A. R. (1989) The meaning of species and speciation: a genetic perspective, in Speciation and Its Consequences (Otte, D., Endler J. A., ed.) Sinauer Associates, Sunderland, MA, 3–27.Google Scholar
- 19.Barcus, V. A., Titheradge, A. J., Murray, N. E. (1995) The diversity of alleles at the hsd locus in natural populations of Escherichia coli. Genetics 140, 1187–97.Google Scholar
- 26.Maiden, M. C., Bygraves, J. A., Feil, E., Morelli, G., Russell, J. E., Urwin, R., Zhang, Q., Zhou, J., Zurth, K., Caugant, D. A., Feavers, I. M., Achtman, M., Spratt, B. G. (1998) Multilocus sequence typing: a portable approach to the identification of clones within populations of pathogenic microorganisms. Proc Natl Acad Sci USA 95, 3140–5.CrossRefPubMedGoogle Scholar
- 27.Feil, E. J., Holmes, E. C., Bessen, D. E., Chan, M. S., Day, N. P., Enright, M. C., Goldstein, R., Hood, D. W., Kalia, A., Moore, C. E., Zhou, J., Spratt, B. G. (2001) Recombination within natural populations of pathogenic bacteria: short-term empirical estimates and long-term phylogenetic consequences. Proc Natl Acad Sci USA 98, 182–7.CrossRefPubMedGoogle Scholar
- 32.Guttman, D. S., Dykhuizen, D. E. (1994) Detecting selective sweeps in naturally occurring Escherichia coli. Genetics 138, 993–1003.Google Scholar
- 67.Ochman, H., Wilson, A. C. (1987) Evolutionary history of enteric bacteria, in Escherichia Coli and Salmonella Typhimurium: Cellular and Molecular Biology (Neidhardt, F. C., Ingraham J. L., Low K. B., Magasanik B., Sch- aechter M., Umbarger H. E., ed.) American Society for Microbiology, Washington, D. C. 1649–54.Google Scholar
- 70.Welch, R. A., Burland, V., Plunkett, G., 3rd, Redford, P., Roesch, P., Rasko, D., Buckles, E. L., Liou, S. R., Boutin, A., Hackett, J., Stroud, D., Mayhew, G. F., Rose, D. J., Zhou, S., Schwartz, D. C., Perna, N. T., Mobley, H. L., Donnenberg, M. S., Blattner, F. R. (2002) Extensive mosaic structure revealed by the complete genome sequence of uropathogenic Escherichia coli. Proc Natl Acad Sci USA 99, 17020–4.CrossRefGoogle Scholar
- 74.Creevey, C. J., Fitzpatrick, D. A., Philip, G. K., Kinsella, R. J., O’connell, M. J., Pentony, M. M., Travers, S. A., Wilkinson, M., Mcinerney, J. O. (2004) Does a tree-like phylogeny only exist at the tips in the prokaryotes? Proceedings 271, 2551–8.Google Scholar