Operons, clusters of co-transcribed genes that often encode functionally linked proteins, are the principal form of gene organization and regulation in prokaryotes [1, 2]. Comparative analysis of bacterial and archaeal genomes has shown that only a few operons are conserved across large evolutionary distances. In general, gene order in prokaryotes is poorly conserved and prone to numerous rearrangements [36]. A detailed analysis of gene order conservation has shown that only 5-25% of the genes in bacterial and archaeal genomes belongs to gene strings (probable operons) shared by at least two distantly related species [7]. The presence of identical or similarly organized operons and suboperons in phylogenetically distant bacterial or archaeal lineages may result from three distinct evolutionary processes. Firstly, inheritance from the respective common ancestor - the core of the ribosomal protein superoperon is a case in point, but such conservation of operon organization is relatively rare; secondly, independent origin of identical operons or suboperons in different lineages; and thirdly, emergence of operons in a single lineage with subsequent dissemination by horizontal transfer. The potential central role of horizontal transfer in the evolution of operon organization of prokaryotic genomes is embodied in the 'selfish operon model' (SOM) [810]. This model posits that "the physical proximity of genes in an operon provides no selective benefit to the individual organism but does enhance the fitness of the gene cluster itself, as clusters can be efficiently inherited horizontally as well as vertically" [11]. Under SOM, operons are conceptually analogous to integrating viruses (phages), transposons and other mobile genetic elements, although coregulation of the genes in an operon could be an important selective factor that favors retention of operons during evolution.

Horizontal gene transfer (HGT) events have been classified into distinct categories of acquisition of new genes, acquisition of paralogs of existing genes and xenologous gene displacement whereby a gene is displaced by a horizontally transferred ortholog from another lineage (xenolog [12]). Each of these types of horizontal transfer is common among prokaryotes, but their relative contributions differ in different lineages [13]. Comparative-genomic analyses by many groups have suggested that, on the whole, horizontal gene transfer had substantial effects, albeit uneven in different lineages, on the gene content of bacterial and archaeal genomes [1319]. However, in spite of the considerable popularity of the selfish operon theory, we are unaware of systematic studies of horizontal gene transfer events at the level of operons. In part, this is likely to have been caused by the scarcity of experimental data on operon organization in any prokaryote other than Escherichia coli.

Recent phylogenetic analyses of ribosomal proteins revealed several instances of apparent xenologous gene displacement within a conserved operon, in which other genes have not been horizontally transferred; in other words, these operons appear to represent an evolutionary mosaic [2022]. Another study demonstrated a complicated mosaic organization of the leukotoxin operon in bacteria of the genus Mannheimia (Pasteurella); the observed evolutionary pattern had to be explained through multiple gene transfer events, which led to the hypothesis that, in this case, frequent gene displacement conferred selective advantage onto the bacterium by maintaining antigenic variation [23]. In earlier studies, evolution of operons from gene blocks with distinct evolutionary fates has been considered for rfb operons coding for lipopolysaccharide biosynthesis in enterobacteria [24].

To assess the role of horizontal gene transfer in the evolution of operons systematically, we undertook phylogenetic analysis of members of highly conserved gene neighborhoods that are predicted to constitute operons [25]. We focused primarily on mosaic operons in which one or more of the genes apparently have been transferred from distantly related species such that the phylogeny of the transferred genes is obviously incongruent with the phylogeny of the remaining genes in the respective operons.

Results and discussion

Identification of horizontal gene transfer

Experimental data on operons in organisms other than E. coli and, to a lesser extent, B. subtilis are scarce. Therefore we used conserved gene pairs and connected gene neighborhoods associated with them as an approximation of operon organization of genes in other prokaryotic genomes. Several studies have suggested strongly that all gene pairs that are conserved in multiple genomes belong to the same operon [7, 25, 26]. Here we used an extremely conservative threshold (conservation of a gene pair in 10 genomes) to ensure that only genuine operons were analyzed. BLASTP searches for potential horizontal gene transfer identified 729 candidate genes (9% of all genes comprising conserved neighborhoods in 41 analyzed genomes), that is, genes whose encoded protein sequences were more similar to homologs from phylogenetically distant taxa than to those from the reference taxon (it might be worth noting that, throughout this analysis, we treated genes as atomic units and did not consider the relatively unlikely possibility of HGT for portions of genes). Phylogenetic analysis of these genes and their neighbors revealed different types of evolutionary events, some of which involve whole operons, whereas others seem to reflect operon mosaicity.

Probable horizontal transfer of whole operons or large portions of operons, when phylogenetic trees for all genes in a predicted operon had the same topology (which, however, was incompatible with the species tree) was identified in 35 neighborhoods - approximately one third of all analyzed neighborhoods. These events were classified into three categories: acquisition of a new (for the given lineage) operon, paralogous operon acquisition and xenologous operon displacement [13]. Examples of all these classes of apparent operon transfer events are given in Table 1. These 35 neighborhoods generally represented functional classes of genes known to be prone to HGT: transporters, general metabolism-related genes and signal transduction systems [13, 15, 17]. This seems to be a relatively low level of horizontal transfer in view of the purported selfish behavior of operons [9, 10]. However, the strict threshold, described above, on the detection of conserved gene pairs undoubtedly led to many horizontally transferred operons being missed. Thus, the present analysis gives a conservative low bound of operon transfer.

Table 1 Examples of horizontally transferred operons

In addition, 19 predicted operons with different phylogenetic affinities of the constituent genes, that is, apparent mosaic operons, were identified (Table 2). Again, this is definitely a low bound - not only because of the high threshold set for the identification of conserved gene pairs, but also because this number includes only cases that were clearly resolved by phylogenetic tree analysis. In addition, we detected many uncertain cases where the different phylogenetic affinities of genes within an operon were not strongly supported (data not shown); at least some of these are probably also mosaic operons.

Table 2 Examples of probable mosaic operons

Below we describe in greater detail several case studies of putative mosaic operons; in each of these cases, in addition to the basic set of 41 species, we included in the analysis the apparent orthologs of the respective proteins from all prokaryotic species in which they were detected, in order to control for possible effects of taxon sampling. We found that, although the details of tree topology inevitably depended on the set of species analyzed, the conclusions regarding HGT were not affected by the inclusion of additional species.

Case studies of mosaic operons

Ribosomal protein L29 gene

In the previous study that prompted this work, we analyzed the phylogeny of several ribosomal proteins and found several cases of apparent horizontal transfer resulting in mosaic operon organization [20]. Horizontal transfer "in the heart of the ribosome" also has been independently described by others [21, 22]. Here we report another case of a ribosomal protein operon with apparent in situ gene displacement (that is, displacement without change of the local gene arrangement) via HGT. Figure 1a shows the highly conserved gene arrangement around the gene for the large subunit protein L29. The phylogenetic trees for the flanking L16 and S17 genes showed largely congruent topologies without any indications of HGT (Figure 1b,d). In contrast in the L29 tree, unexpected clustering is seen for Aquifex aeolicus and both Rickettsia: the Aquifex branch is within the archaeal cluster, whereas the Rickettsia group is with Chlamydia, rather than with the rest of alpha-proteobacteria: the taxon where Rickettsia belong (Figure 1c). In situ displacement is the most likely mechanism behind this observation given that the structure of this operon is conserved in the majority of bacteria. The nature of the selective advantages conferred by this gene substitution is unclear, but the apparent sources of the transferred genes suggest that the displacements indeed might be adaptive. Aquifex apparently acquired the L29 gene from archaea, which could be related to the adaptation to the hyperthermal conditions, whereas Rickettsia probably captured the gene from other parasitic bacteria, such as Chlamydia. However, these observations also allow a non-adaptationist interpretation, under which the apparent source of acquired genes simply reflects the increased likelihood of gene exchange between the respective organisms due to co-habitation, with chance fixation of some of the transferred genes.

Figure 1
figure 1

Genes with different phylogenetic affinities in a ribosomal operon from Aquifex aeolicus and Rickettsia prowazekii. (a) A fragment of ribosomal operon in Aquifex aeolicus (the operon from Thermotoga maritima is shown for comparison), Rickettsia prowazekii and Rickettsia conorrii (operons from other alpha-proteobacteria are shown for comparison). Genes are shown not to scale; the direction of transcription is indicated by arrows and gene numbers/names are given inside each arrow. Orthologous genes are shown by the same color. White arrows show genes in each genome that are unique in this operonic context. Phylogenetic affinity of a gene is shown as a thick colored border on the respective arrow; black denotes belonging to the reference taxon, red denotes not belonging to reference taxon. COG0197 - ribosomal protein L16/L10E; COG0255 - ribosomal protein L29; COG0186 - ribosomal protein S17. For species abbreviations, see Materials and methods. (b) Unrooted maximum-likelihood tree for ribosomal protein L16. Branches supported by bootstrap probability >70% are marked by black circles. Names of the genes from mosaic operons and the respective branches are shown in red. Branches for which the likelihoods of alternative placements were assessed using the RELL method are indicated by circles with numbers (see Table 3). (c) Unrooted maximum-likelihood tree for ribosomal protein L29;. the designations are as in Figure 1b. (d) Unrooted maximum-likelihood tree for ribosomal protein S17; the designations are as in Figure 1b.

Table 3 Kishino-Hasegawa test for the analyzed cases of apparent xenologous gene displacement in situ

The ruvBgene of Mycoplasma

The genes for Holliday junction resolvase subunits RuvA and RuvB form an operon that is conserved in most of the sequenced bacterial genomes (Figure 2a). In the phylogenetic trees for RuvA and RuvB, the branch that includes Ureaplasma and Mycoplasma occupies drastically different positions. In contrast to RuvA, which belongs to the Gram-positive clade as expected (Figure 2b), mycoplasmal RuvB clusters with the epsilon-proteobacteria (Helicobacter and Campylobacter) and the mycoplasma-epsilon-proteobacteria clade further joins alpha-proteobacteria (Figure 2c). This clustering is strongly supported by bootstrap analysis and was shown to be robust using statistical tests of tree topology (Table 3). Thus, the ruvB gene seems to have undergone xenologous displacement in situ after the divergence of the mycoplasmal branch from the rest of Gram-positive bacteria. Notably, the gene exchange seems to have occurred between phylogenetically distant parasitic bacteria.

Figure 2
figure 2

In situ displacement of the ruvB gene in Mycoplasma. (a) Organization of the Holliday junction resolvasome operon and surrounding genes in bacteria. COG0632 - Holliday junction resolvasome, DNA-binding subunit, COG2255 - Holliday junction resolvasome, DNA-binding subunit, COG0817 - Holliday junction resolvasome, endonuclease subunit, COG0392 - Predicted integral membrane protein, COG0282 - acetate kinase, COG0839 - NADH:ubiquinone oxidoreductase subunit 6 (chain J), COG0244 - ribosomal protein L10, COG0732 - restriction endonuclease S subunits, COG0809 - S-adenosylmethionine:tRNA-ribosyltransferase-isomerase, COG0772 - bacterial cell division membrane protein, COG0624 - acetylornithine deacetylase/succinyl-diaminopimelate desuccinylase and related deacylases, COG1487 - predicted nucleic acid-binding protein, COG1132 - ABC-type multidrug transport system, ATPase and permease components, COG0442 - prolyl-tRNA synthetase, COG0323 - DNA mismatch repair enzyme, COG1408 - predicted phosphohydrolases. The designations are as in Figure 1a. For species abbreviations, see Materials and methods. (b,c) Unrooted maximum-likelihood tree for RuvA (b) and RuvB (c); the designations are as in Figure 1b.

Undecaprenyl pyrophosphate synthase gene in the lipid biosynthesis operon of Rickettsia

In Rickettsia, the undecaprenyl pyrophosphate synthase gene (uppS), which belongs to a highly conserved doublet of lipid biosynthesis genes embedded in functionally diverse operons (Figure 3a), clusters with an unexpected assemblage of bacterial orthologs, including those from the spirochete Treponema pallidum and Fusobacterium nucleatum, but not with the 'native' taxon, alpha-proteobacteria (Figure 3b,c). Statistical testing of the tree topology showed that clustering of rickettsial uppS with those from other alpha-proteobacteria is highly unlikely (Table 3). The apparent in situ gene displacement of the uppS gene in Rickettsia was accompanied by a breakdown of the operon into three fragments (Figure 3a). The topology of the uppS tree suggests the possibility of multiple HGT events, although only the rickettsial genomes show evidence of gene displacement in situ. The emergence of gene displacement in bacterial parasites is noted here again.

Figure 3
figure 3

Genes with different phylogenetic affinities in the lipid biosynthesis operon of Rickettsia. (a) Organization of the lipid biosynthesis operon and surrounding genes in Rickettsia prowazekii and Rickettsia conorrii (operons from three other alpha-proteobacteria are shown for comparison). COG0020 - undecaprenyl pyrophosphate synthase, UppS; COG0575 - CDP-diglyceride synthetase; COG0750 - predicted membrane-associated Zn-dependent proteases; COG0233 - ribosome recycling factor; COG0528 - uridylate kinase; COG0745 - OmpR-like response regulator; COG0642 - signal transduction histidine kinase; COG0729 - outer membrane protein; COG2919 - septum formation initiator; COG0743 - 1-deoxy-D-xylulose 5-phosphate reductoisomerase. The designations are as in Figure 1a. For species abbreviations, see Materials and methods. (b,c) Unrooted maximum-likelihood tree for UppS (b) and CdsA (c); the designations are as in Figure 1b.

NADH:ubiquinone oxidoreductase subunits in Halobacteriumsp

Gene organization in the NADH:ubiquinone oxidoreductase operon is highly conserved in all sequenced archaeal genomes and those of several groups of bacteria (Figure 4a). The nuoI gene of Halobacterium sp. shows an unexpected phylogenetic affinity with proteobacteria (Figure 4c), whereas the neighboring genes have the regular archaeal affinities (Figure 4b,d). The unusual phylogeny of halobacterial NuoI, which was strongly supported by statistical tests (Table 3), suggests in situ displacement by a proteobacterial gene. Notably, all three NADH:ubiquinone oxidoreductase subunits of the cyanobacteria unexpectedly grouped within the archaeal clusters of the respective trees (Figure 4b-d). These observations point to a complex history of HGT for the genes encoding all subunits of NADH:ubiquinone oxidoreductase.

Figure 4
figure 4

In situ gene displacement in the NADH-ubiquinone oxidoreductase operon in Halobacterium. (a) Organization of the NADH-ubiquinone oxidoreductase operon in selected archaeal and bacterial genomes. COG0838 - NADH:ubiquinone oxidoreductase subunit 3 (chain A), COG3077 - DNA-damage-inducible protein J, COG0852 - NADH:ubiquinone oxidoreductase 27 kD subunit, COG0649 - NADH:ubiquinone oxidoreductase 49 kD subunit 7, COG1905 - NADH:ubiquinone oxidoreductase 24 kD subunit, COG1894 - NADH:ubiquinone oxidoreductase, NADH-binding (51 kD) subunit, COG1034 - NADH dehydrogenase/NADH:ubiquinone oxidoreductase 75 kD subunit (chain G), COG1005 - NADH:ubiquinone oxidoreductase subunit 1 (chain H), COG1143 - Formate hydrogenlyase subunit 6/NADH:ubiquinone oxidoreductase 23 kD subunit (chain I), COG0839 - NADH:ubiquinone oxidoreductase subunit 6 (chain J), COG0713 - NADH:ubiquinone oxidoreductase subunit 11 or 4L (chain K), COG1009 - NADH:ubiquinone oxidoreductase subunit 5 (chain L), COG1008 - NADH:ubiquinone oxidoreductase subunit 4 (chain M), COG1007 - NADH:ubiquinone oxidoreductase subunit 2 (chain N). The designations are as in Figure 1a. For species abbreviations, see Materials and methods. (b-d) Unrooted maximum-likelihood tree for NuoH (b), NuoI (c) and NuoJ (d); the designations are as in Figure 1b.

Lipopolysaccharide biosynthesis operon in Methanothermobacter thermoautotrophicus and Deinococcus radiodurans

The genes of the lipopolysaccharide biosynthesis (rfbABCD) operon appear to have been extensively and independently shuffled in many prokaryotic genomes and might have undergone multiple horizontal transfers. This conclusion is supported both by examination of the operon organization (Figure 5a) and by phylogenetic tree analysis (Figure 5b-e). The trees showed a clear affinity between the rfbA, rfbB, rfbC genes of Methanothermobacter thermoautotrophicum and Clostridium acetobutylicum (Figure 5b-d), with Fusobacterium nucleatum and Listeria monocytogenes joining the cluster in the case of rfbB (Figure 5b), whereas M. thermoautotrophicum RfbD clustered with its archaeal orthologs as expected (Figure 5e). The genes of the rfbABCD operon in Methanothermobacter are shuffled compared to the probable ancestral order, which is found in many bacteria and C. acetobutylicum also shows a rearrangement (Figure 5a). One likely scenario in this case is that M. thermoautotrophicum acquired the rfbABCD operon with the typical gene order from a bacterium of the clostridial lineage, which was followed by displacement of three resident genes and loss of one of the invading genes, accompanied by operon rearrangement. An alternative scenario is that the rearrangement occurred in the source bacterium of the clostridial group and Methanothermobacter acquired only the rfbACB portion, which might have inserted head-to-tail downstream of the original operon, followed by elimination of the resident rfbABC (Figure 5a).

Figure 5
figure 5

Genes with different phylogenetic affinities in the lipopolysaccharide biosynthesis operon of Methanothermobacter thermoautotrophicus and Deinococcus radiodurans. (a) Organization of the lipopolysaccharide biosynthesis operon in different prokaryotes. COG1091 - dTDP-4-dehydrorhamnose reductase; COG1209 dTDP-glucose pyrophosphorylase; COG1898 - dTDP-4-dehydrorhamnose 3,5-epimerase and related enzymes; COG1088 - dTDP-D-glucose 4,6-dehydratase. The designations are as in Figure 1a. For species abbreviations, see Materials and methods. (b-e) Unrooted maximum-likelihood tree for RfbB (b), RfbC (c), RfbA (d) and RfbD (e); the designations are as in Figure 1b.

Another interesting case of mosaic structure of the same operon is seen in Deinococcus radiodurans (Figure 5a). Deinococcus RfbA shows clear affinity with proteobacteria (Figure 5d), whereas RfbD is of archaeal descent (Figure 5e), with RELL analysis revealing no competing topologies (Table 3). The remaining two genes of this operon in Deinococcus, rfbB (DRA0041) and rfbC (DRA0043), have uncertain phylogenetic affinities (Figure 5b,5c). Thus, as in the case of M. thermoautotrophicus, this operon in Deinococcus was apparently formed through at least two events of xenologous gene displacement in situ and gene shuffling.

Leucine/isoleucine biosynthesis operon

Perhaps the most prominent case of mosaic operon organization is the leucine/isoleucine biosynthesis operon of several bacteria and archaea, particularly Thermotoga maritima. This is the only known branched chain amino acid biosynthesis operon, and it is partly conserved in a wide range of bacteria (Figure 6a). Following initial indications from the analysis of taxon-specific BLAST hits, we constructed phylogenetic trees for each of the genes of this operon. Unlike other bacteria, Thermotoga has two leuA paralogs, which are adjacent in the operon. The proteins encoded by these paralogous genes show clearly distinct phylogenetic affinities: TM0552 belongs to a distinct clade within the archaeal domain, whereas TM0553 is part of a Gram-positive bacterial cluster (Figure 6b). This phylogenetic mosaic in Thermotoga extends further, with LeuB (TM0556) clustering with proteobacterial orthologs (Figure 6c), and LeuC (TM0554) and LeuD (TM0555) with archaeal orthologs (Figure 6d,e). All these affinities were strongly supported by two versions of bootstrap analysis (Table 3). The genes encoding LeuA, LeuC, and LeuD from Thermotoga, Clostridium, Aquifex and both Pyrococcus abyssi and P. furiosus belong to a well-defined clade, which also includes a medley of alpha-proteobacteria and cyanobacteria, within the archaeal domain in the respective trees (Figure 6b-e). Thus, this sub-operon apparently has been relatively recently horizontally spread among these organisms. Pyrococcus abyssi and P. furiosus probably acquired these genes after the divergence from the common ancestor with P. horikoshii because the latter has only the typical archaeal operon (Figure 6a).

Figure 6
figure 6

Genes with different phylogenetic affinities in the leucine/isoleucine biosynthesis operon. (a) Operon organization in different prokaryotic species. COG0028 - acetolactate synthase, large subunit; COG0440 - acetolactate synthase, small subunit; COG0059 - ketol-acid reductoisomerase; COG0129 - dihydroxyacid dehydratase; COG0119 - isopropylmalate synthases; COG0473 - isocitrate/isopropylmalate dehydrogenase; COG0066 - 3-isopropylmalate dehydratase, small subunit; COG0065 - 3-isopropylmalate dehydratase, large subunit. The designations are as in Figure 1a. For species abbreviations, see Materials and methods. (b-e) Unrooted maximum-likelihood tree for LeuA (b), LeuB (c), LeuC (d) and LeuD (e); the designations are as in Figure 1b.

Given the apparent propensity of Thermotoga (and other hyperthermophilic bacteria) for acquisition of archaeal genes via HGT, it seems most likely that the archaeal version of the leuACD suboperon originally entered the bacterial domain via Thermotoga or a related thermophilic bacterium. Formally, in Thermotoga these events could be classified as a combination of paralogous (sub)operon acquisition (TM0554-TM0555 in addition to another paralogous archaeal gene pair TM0291-TM0292) and xenologous gene displacements (genes TM0553, TM0556). In Clostridium, xenologous operon displacement seems to have occurred because the ancestral operon of the Gram-positive type apparently had been lost. The subsequent evolution of this operon in the four organisms proceeded along different paths. Aquifex has lost the operon structure even for the two subunits of 3-isopropylmalate dehydratase (LeuB, LeuD). Different genes in the operons of P. abyssi and C. acetobutylicum have been translocated and several genes probably have been independently accrued (Figure 6a). In both P. abyssi and Thermotoga, the original leuA and leuB genes within the leuABDC core seem to have been independently displaced by bacterial orthologs without a clear affinity with any specific bacterial lineage (Figure 6a). The most likely scenario for evolution of this operon in Thermotoga is that it originated as a Gram-positive type operon and subsequently many genes (or sub-operons) have been displaced in situ through multiple horizontal transfers and a few additional genes have been inserted into the preexisting structure. The alternative but less likely hypothesis involves independent, de novo operon assembly from genes of different phylogenetic affinities. Several other apparent HGT events were detected during the analysis of the phylogenetic trees for leucine biosynthesis genes (DR1614 in LeuD tree, DR1610 in LeuC tree (Figure 6d,e)) but, in these cases, the acquired genes do not belong to conserved operons.


Intragenomic plasticity and inter-species horizontal mobility of operons are thought to be important facets of prokaryotic genome evolution. Indeed, the results presented here indicate that horizontal transfer of entire operons is the most likely explanation for most of the findings of co-localized 'alien' genes in a genome, which is generally consistent with SOM. However, a substantial fraction - approximately 35% - of operons with indications of horizontal transfer events appear to consist of genes with different phylogenetic affinities. Barring artifacts of phylogenetic analysis, which can never be ruled out completely, but appear unlikely given the strong statistical support for the anomalous placement of the genes in question in phylogenetic trees, two evolutionary scenarios for the origin of such mosaic operons are conceivable. The first involves de novo assembly of operons, in part from genes acquired via HGT, whereas the second one postulates in situ xenologous displacement of genes within a resident operon. Analysis of mosaic operons suggested that both scenarios might apply, but in situ displacement is likely to be more frequent. In several cases, in situ displacement seems to have occurred between genomes of distantly related parasitic bacteria that might have shared a host. A sequence of events that is often considered as an alternative to HGT is an ancient duplication with subsequent differential loss of paralogs. However, in the cases analyzed here, this seems to be a particularly remote possibility because a tandem duplication followed by lengthy evolution of both paralogs within the operon would be required to mimic in situ displacement. Tandem pairs of paralogs are uncommon in operons and such a 'smoking gun' was not observed in any of the suspected cases of in situ displacement.

At first glance, in situ gene displacement seems highly unlikely: given the vast evolutionary distance separating the donor and recipient genomes, homologous recombination is out of the question. In cases when the displacing gene(s) is located on the periphery of an operon (for example, Figure 5a), a plausible mechanism could involve initial insertion of the invading gene in the vicinity of the resident operon, followed by deletion of intervening genes (provided these are non-essential). However, when the displacing gene is tucked between resident ones (for example, Figures 4a, 6a), displacement must have occurred with surgical precision. The only conceivable explanation seems to be that HGT is extremely common in the evolution of prokaryotes and so is intragenomic recombination, which provides for rare chance occurrences of in situ displacement. Conceivably, a horizontally acquired gene that displaces the resident ortholog without disruption of operon organization would have its chances of evolutionary fixation greatly increased, hence the apparent disproportional survival of the displacing genes. This explanation does not refute SOM as the conceptual framework explaining the origin of operons but emphasizes the 'altruistic' aspect of the evolution of operons whereby the operon integrity is maintained by strong purifying selection at the organism level.

Materials and methods

Sequence data

Amino acid sequences from 41 completely sequenced prokaryotic genomes were extracted from the Genome division of the Entrez retrieval system [27] and used as the master species set for this analysis. Bacterial species abbreviations: Aquifex aeolicus (Aae), Bacillus halodurans (Bha), Bacillus subtilis (Bsu), Streptococcus pyogenes (Spy), Staphylococcus aureus (Sau), Clostridium acetobutylicum (Cac), Borrelia burgdorferi (Bbu), Campylobacter jejunii (Cje), Chlamydia trachomatis (Ctr), Chlamydophila pneumoniae (Cpn), Deinococcus radiodurans (Dra), Escherichia coli (Eco), Haemophilus influenzae (Hin), Helicobacter pylori (Hpy), Lactococcus lactis (Lla), Mesorhizobium loti (Mlo), Mycoplasma genitalium (Mge), Mycoplasma pneumoniae (Mpn), Mycobacterium tuberculosis (Mtu), Mycobacterium leprae (Mle), Pasteurella multocida (Pmu), Neisseria meningitidis (Nme), Pseudomonas aeruginosa (Pae), Rickettsia prowazekii (Rpr), Rickettsia conorii (Rco), Synechocystis PCC6803 (Ssp), Thermotoga maritima (Tma), Treponema pallidum (Tpa), Vibrio cholerae (Vch), Xylella fastidiosa (Xfa), Buchnera sp. (Bsp), Caulobacter crescentus (Ccr), and Ureaplasma urealyticum (Uur). Archaeal species abbreviations: Aeropyrum pernix (Ape), Archaeoglobus fulgidus (Afu), Halobacterium sp. (Hsp), Methanothermobacter thermoautotrophicum (Mth), Methanococcus jannaschii (Mja), Pyrococcus horikoshii (Pho), Pyrococcus abyssi (Pab), Thermoplasma volcanium (Tvo), Thermoplasma acidophilum (Tac), Sulfolobus solfataricus (Sso). In addition, the following species were included in the case studies described in the text; bacteria: Agrobacterium tumefaciens (Atu), Bifidobacterium longum (Blo), Brucella melitensis (Rso), Chlorobium tepidum (Cte), Enterococcus faecalis (Efa), Fusobacterium nucleatum (Fnu), Lactobacillus plantarum (Lpl), Leptospira interrogans serovar (Lint), Listeria innocua (Lin), Listeria monocytogenes (Lmo), Nitrosomonas europaea (Neu), Nostoc sp. (Nsp), Oceanobacillus iheyensis (Oih), Ralstonia solanacearum (Rso), Sinorhizobium meliloti (Sme), Streptomyces coelicolor (Sco), Thermoanaerobacter tengcongensis (Tte), Thermosynechococcus elongatus (Tel), Xanthomonas campestris (Xca), Shewanella oneidensis (Son); archaea: Methanopyrus kandleri (Mka), Methanosarcina acetivorans (Mac), Pyrobaculum aerophilum (Pae), Pyrococcus furiosus (Pfu).

Reconstruction of gene neighborhoods

Gene neighborhoods for the 41 compared genomes were reconstructed as previously described [25]. Briefly, the collection of clusters of orthologous groups of proteins from complete genomes (COGs) [28] was used as the source of information on orthologous relationships for detecting conserved gene pairs. For the purpose of this analysis only 'highly conserved' gene pairs were considered, that is, those formed by genes from two COGs that were present in the same orientation and separated by less than three genes in at least 10 of the compared genomes. This conservative approach was adopted in order to ensure that all analyzed gene pairs belong to the same operon. At the next step, overlapping gene pairs were joined in triplets; each triplet was required to exist in at least one genome. Overlapping triplets were used to construct gene arrays by run search in an oriented graph; a gene array may or may not be found in its entirety in any available genome. Finally, gene arrays that shared at least three COGs were clustered into neighborhoods by using a single-linkage clustering algorithm [25]. Conserved gene pairs that did not belong to the reconstructed gene arrays were also analyzed.

Searching for candidate horizontally transferred genes

The protein sequences encoded by the genes of each neighborhood were searched against the non-redundant protein sequence database (NCBI, NIH, Bethesda) using the BLASTP program. The BLAST hits were analyzed to identify their potential phylogenetic affinity. For each protein, the best hits were identified to the taxon to which the given species belongs (hereinafter, reference taxon) and to other major taxa; hits to closely related species were disregarded (see Table 1S in the additional data file). Proteins that had more significant (lower E-value) hits to a non-reference taxon than to the reference taxon were considered candidates for horizontal transfer and the respective orthologous protein clusters were subject to further phylogenetic analysis as described in the next section. If phylogenetic analysis indicated that a particular gene was likely to be horizontally transferred, phylogenetic trees were built also for the genes predicted to belong to the same operon. When different phylogenetic affinities were found for genes of the same predicted operon, this operon was considered to be 'mosaic'.

Phylogenetic analysis

Multiple protein sequence alignments were constructed using the T-Coffee program [29] and positions containing >70% gaps were excluded. Distance trees were constructed by using the least-square method as implemented in the FITCH program of the PHYLIP package [30, 31]. The least-square trees were subjected to maximum-likelihood local rearrangement using the ProtML program of the MOLPHY package, with the JTT-F model of amino acid substitutions [32, 33]. The resulting trees are a surrogate for maximum-likelihood phylogenies; exhaustive maximum-likelihood tree construction is impractical for the number of species analyzed here. Bootstrap analysis was performed for each maximum-likelihood tree using the Resampling of Estimated Log-Likelihoods (RELL) method as implemented in MOLPHY [3234]. Alternative placements of selected clades in maximum-likelihood trees were compared by using the rearrangement optimization (Kishino-Hasegawa) method as implemented in the ProtML program [34].

Additional data file

Additional data, including schematics of operon organization and phylogenetic trees for all gene clusters listed in Table 2, are available in an additional data file (Additional data file 1).