Background

Obligate mutualistic symbioses between insects and proteobacteria have been extensively studied in recent years [1]. The bacteria live in specialized host cells and synthesize those nutrients that are defective in the insects' restricted diets, as confirmed by genomic analyses of several endosymbionts from different insect species. Thus, Buchnera aphidicola and Blochmannia spp. provide mainly amino acids to their hosts, aphids and ants respectively [27], whereas Wigglesworthia glossinidia supplies vitamins and cofactors to the tsetse fly [8]. In the case of the sharpshooters, the genomic analysis revealed that two cohabiting endosymbionts, Baumannia cicadellinicola and Sulcia muelleri (a Bacteroidetes species) are engaged on the symbiotic relationship, providing vitamins and amino acids respectively [9].

Because they live in a protected and nutrient-rich environment, these endosymbiont genomes have experienced a reductive evolutionary process leading to smaller genome sizes than those of their free-living relatives. Nevertheless, they have retained the genes involved in the symbiotic relationship, as well as a reduced repertoire of genes necessary to maintain the three essential functions that define a living cell: maintenance, reproduction and evolution [10]. In the case of B. aphidicola BCc, despite its extremely reduced genome (422 kb, and only 362 protein-coding genes), it still retains a complete machinery for DNA replication, transcription and translation, and a simplified metabolic network for energy production and the synthesis of most essential amino acids needed by its aphid host. However, the loss of genes for the synthesis of tryptophan and riboflavin suggests that it cannot guarantee its host fitness, and it is plausible that a second endosymbiont could be taking on its symbiotic role [7].

Candidatus Carsonella ruddii, considered the primary endosymbiont of the psyllid Pachpsylla venusta, possess a 159,662-bp genome, with only 182 predicted open reading frames [11]. This number of genes is widely below previous proposals for minimal genomes (reviewed in [12]) and is almost half of the number of genes identified in B. aphidicola BCc. Such small number of genes casts doubts on the character of C. ruddii as a living cell. We present a detailed functional analysis of the genes retained in this small genome in order to gain new insights on the physiological role of this putative cell and the significance of the dramatic reductive process suffered by its genome.

Results and discussion

Primary endosymbionts of insects cannot be cultured in the laboratory, making experimental analysis difficult. Therefore, the functional analysis of their genome content must rely on comparative analyses with related bacteria. However, caution must be taken when assigning a function using this approach, especially if shortened genes are identified, since some gene regions coding for specific protein domains that are related with the annotated function can be lost. In some cases, genes encoding proteins with multiple functions may lose the domain(s) responsible for some of these activities [see Additional file 2]. These circumstances might lead to the assignment of a gene to an incorrect functional category, which could lead to an erroneous determination of the set of functions a bacterium is able to perform. C. ruddii genes are considerably shorter than the orthologs found in other bacteria, and show extensive genetic divergence with respect to other γ-proteobacteria, mainly because their high A+T content. Therefore, the risk of over-annotation is especially high for this genome.

In order to consider that C. ruddii is a living organism engaged in a symbiotic relationship with its host, the genes involved in essential living functions as well as those needed for the maintenance of host fitness must be preserved. To elucidate if C. ruddii genome was fulfilling both conditions, we focused our analysis on the genes involved in informational processes (needed for the most essential functions for survival) and the biosynthesis of amino acids (the proposed endosymbiotic role of this biological entity).

One of the most comprehensive efforts to define the minimal core of essential genes was that presented by Gil and co-workers [13, 14]. This study can be a good starting point to identify essential genes involved in informational processes that must be present in any living cell. We compared the complete set of genes of the minimal genome proposed by Gil et al. [13] with the gene repertoire of C. ruddii after the reannotation analysis performed as described in Methods. The results of the analysis clearly show that the C. ruddii gene complement is not sufficient to replicate, transcribe and synthesize proteins (Table 1), thus questioning its consideration as an independent living entity. It is worth mentioning that only 29 genes remain as orphan after our genome reannotation [see Additional file 1]. However, even if we could assign a function to them, many essential functions will still be lacking. Regarding DNA maintenance and replication, no genes for histone-like and single stranded-binding proteins (involved in the structural maintenance and protection of the chromosomal DNA) are found on the genome; replication initiation must be dependent on RecA, since no recruiting proteins are present [15]; the complete DNA replisome is only represented by the two essential core subunits of the DNA polymerase, since the putative genes for the helicase and primase (two enzymes needed for the initiation of the replication, by separating the strands of the double helix and priming the single strand that is going to be replicated with an RNA primer) are highly degraded [see Additional file 2]. Furthermore, no genes for gyrase (needed to relax the positive supercoils generated by the replication process), ligase (essential for joining the DNA fragments in the lagging strand), and RNAse HI (needed to eliminate the RNA primer) have been found. The transcription machinery is limited to the core subunits of RNA polymerase, since the gene for the proposed sigma factor subunit has lost most of the sequence, including the residues involved in the promoter recognition, and no other transcription factors have been annotated. Finally, the translation machinery is highly reduced. It contains the minimal set of RNA genes required for protein synthesis, including the three rRNA genes and 28 tRNAs genes, enough for decoding the 61 codons of the genetic code, although the gene mesJ, coding for the tRNAIle-lysidine synthethase required for the maturation of tRNAIle (CAT), is missing. Moreover, at least 9 aminoacyl-tRNA synthetases and 15 out of 50 proposed essential ribosomal protein components are missing or degraded, questioning the capacity of C. ruddii to build functional ribosomes. Seven previously unnanotated ORFs could in fact be some remnants of ribosomal protein genes, based on its genome position and considering some degree of synteny with other completely sequenced γ-proteobacteria, but only three of them (rplQ, rpsE, and rpsO) retain enough homology to be annotated as such. Finally, most ribosome maturation proteins and several essential translation factors (such as the elongation factors P and Ts) are also missing from this genome.

Table 1 Essential genes involved in informational processes that are missing or degraded in C. ruddii genome.

As it lacks most genes for DNA replication, transcription and translation, this biological entity might be dependent on an external source for these functions. Although the transition of genes from C. ruddii to the host nucleus has been proposed [11], it should be taken into account that the ancestor of the psyllids infected by the ancestor of C. ruddi was already a complex multicellular organism, so that all transferred genes must have been acquired by the germinal cell line. The amount of genes that should have been transferred is so big that it would be more plausible to consider that this intracellular entity takes advantage of some nuclear genes involved in the mitochondrial activity. The dual target of proteins encoded by the eukaryotic nucleus between different organelles (mitochondria and plastids) has already been described in plant cells, especially regarding genes involved in informational processes [16, 17].

In addition to the essential functions that define life, an endosymbiont must provide its host all essential complements to its nutritionally deficient diet. Psyllids feed on phloem sap, rich in sugars but relatively poor in nitrogenated compounds, especially essential amino acids. Since it has been proposed that the role of C. ruddii, similarly to B. aphidicola, is providing those amino acids to its host [18], we looked for the maintenance of all pathways involved in essential amino acids biosynthesis [see Additional file 3] [19, 20]. The analysis revealed that the pathways for the synthesis of histidine, phenylalanine and tryptophan are absent. Moreover, although the threonine biosynthetic pathway is complete, thrB is probably not functional, and thus an activity as yet unidentified should supply such function. The same case happens in Candidatus Ruthia magnifica, an autotrophic endosymbiotic with a complete metabolic network in which the thrB gene is absent in the context of an otherwise complete threonine biosynthetic pathway [21]. This limited provision of amino acids is not enough to sustain the requirements of its host. The loss of essential endosymbiotic functions has also been detected in other insect-bacteria associations but, in such cases, a second symbiont appears to be complementing the insect diet [7, 9]. Surprisingly, although secondary symbionts have been found in other psyllids, no other symbionts were detected in Pachpsylla venusta [11, 18]. Nevertheless, such statement is based on a single work, where only bacterial symbionts were searched by PCR amplification of 16S-23S rDNA [18], and the absence of contaminating sequences during the sequencing project, which was performed on DNA purified from bacteriocytes [11]. Therefore, a problem on a specific amplification reaction cannot be discarded, and an extracellular and/or eukaryotic symbiont would not have been detected by these analyses [22, 23]. If another partner is detected in this symbiotic association, the possibility that C. ruddii is the remnant of an ancient endosymbiont that is being driven towards its extinction and replacement by the new symbiont, as it has already been proposed in other insects [7, 22, 24], cannot be ruled out.

Conclusion

A careful functional analysis of the gene repertoire of C. ruddii reveals that the extensive degradation of its genome is not compatible with its consideration as a mutualistic endosymbiont and, even more, as a living organism. Although C. ruddii is defined as a psyllid primary endosymbiont, the genes for the biosynthesis of three essential amino acids have been completely lost. This observation raises doubts about both the role of C. ruddii in the symbiotic relationship with its host and the absence of a secondary symbiont capable to provide the rest of necessary nutrients to complement the unbalanced insect diet. Although a bacterial symbiont has not been found in the psyllid Pachpsylla venusta, further studies need to be conducted in order to detect the possibility of a second, maybe eukaryotic, symbiont.

We propose that this strain of C. ruddii can be viewed as a further step towards the degeneration of the former primary endosymbiont, and its transformation in a subcellular new entity between living cells and organelles, which probably would take advantage of mitochondrial functions encoded by the nucleus, especially for basic informational processes needed for maintenance and multiplication. If confirmed, this would be the first example of such a scenario in animal cells.

Methods

In order to confirm the presence and functionality of all ORFs identified in the original report and search for additional functions, the complete genome sequence of C. ruddii (accession number AP009180) was re-analyzed. This is particularly relevant in this case, as 46 of the putative genes (25% of the genome) are annotated as hypothetical ORFs in the original report. ORFs with putative functions were obtained from the original annotation of the genome and complemented with Glimmer predictions [25]. This set of putative ORFs was checked by homology searches using BLASTX and the latest version of GenBank database. All hits with e-values above 1e-02 were disregarded. Whenever a clear homology was found, the putative ORF was translated, and the protein sequences were aligned using ClustalW [26] with the corresponding translated orthologous genes found in E. coli and all sequenced B. aphidicola genomes (accession numbers U00096.2, BA000003, AE013218, AE016826 and CP000263). Many proteins were found to be rather degraded, presenting numerous deletions and a high number of amino acid changes. In order to confirm the maintenance of the original function, we looked for the presence of the domains and active residues (if known) responsible for functionality, using information in Uniprot, Pfam, EcoCyc, and EcoGene databases [2730]. The secondary structures of the resulting proteins and their orthologs were also compared, to exclude major structural changes that could impede function.

Looking for possible essential functions that might have been missed on the original annotation, the genes responsible for these functions in γ-proteobacteria were retrieved, and used to build Hidden Markov Models and perform searches in the C. ruddii genome with HMMER [31]. Manual inspection of the results may allow detecting subtle homologies that could have been missed in the BLASTX search. In addition, positional information was taken into account to identify genes that maintain synteny with other completely sequenced γ-proteobacteria genomes. All combined strategies allowed the annotation of 17 previously considered orphan genes, six of which appear to be functional. In addition, 9 previously annotated genes were considered unfunctional [see Additional file 1]. Therefore, there are only 29 ORFs in the C. ruddii genome that remain annotated as hypothetical proteins, while 20 ORFs can be traced back to ancestral functional genes, although the changes in the sequence indicate that they probably are non-functional.

Ribosomal RNA genes were confirmed with BLASTN. Anticodons for tRNA genes were identified with tRNAscan-SE [32]. Discrimination of tRNA genes with anticodon CAT was performed with the program TFAM [33] using the tRNA profiles of initiator tRNAMet, elongator tRNAMet and tRNAIle(CAT) [34].

Metabolic pathways were reconstructed using KEGG [35], and enzyme characteristics were checked in BRENDA [36].