Background

Methionine residues of polypeptide chains are common targets of oxidation phenomena which alter conformation, sub-cellular localization, and aggregation state of proteins causing detrimental effects on vital cell functions and activities [1],[2]. Aerobic organisms thus urgently need to repair their methionine-oxidized proteins and their minimal gene sets systematically include Msr genes [3],[4].

Increasing concentrations of water-borne signaling proteins (pheromones), which Euplotes raikovi uses to promote its vegetative (mitotic) growth and the sexual phenomenon of conjugation [5], were observed to undergo oxidation in cause-effect relationships with cell ageing [6]. This oxidation hits the methionine residues that are more exposed on the surface of the pheromone molecular structure and was shown to cause remarkable modifications of protein activity [6], as is the case in other cell systems [2]. To shed light on the molecular mechanism evolved by E. raikovi to repair its methionine-oxidized pheromones, attention was focused on the Msr genes that are transcribed in the cell somatic nucleus (macronucleus) characterized by an eccentric sub-chromosomic organization in which individual, gene-size DNA molecules are replicated in thousands of copies fully autonomous for both replication and transcription [7].

Differently from the MsrB gene showing a single form, the gene specifying MsrA was found to be present in the E. raikovi macronucleus in multiple isoforms [8]. One isoform, designated as msrAB gene, is described here for its unique nucleotide sequence containing information for the synthesis of MsrA and MsrB proteins characterized by unequivocal structural relationships with MsrA and MsrB of Alphaproteobacteria.

Results and discussion

The msrAB gene cloning involved two PCR steps. A 231-bp MsrA-specific DNA fragment was first generated through amplification of total DNA preparations run with a combination of degenerate oligonucleotides (labeled #1 and 2 in Additional file 1: Table S1) specific to amino acid sequence stretches conserved in MsrA proteins of various organisms. In a second step, two nested PCR amplifications were run using primers (from #3 to #6 in Additional file 1: Table S1) specific to this DNA fragment in combination with a primer (#7 in Additional file 1: Table S1) specific to the C4A4/G4T4 repeats that are distinctive of the telomeric ends of every Euplotes macronuclear gene-size molecule [7]. Among four structurally distinct gene isoforms that were obtained, we reconstructed the full-length sequence of the longest isoform (1595bp) by overlapping the individual sequences. The reconstructed sequence was then confirmed by sequencing the amplification product of a PCR run with primers (#8 and 9 in Additional file 1: Table S1) specific to regions located close to its telomeric ends.

Instead of including a single open reading frame (ORF) like the other three gene isoforms (obtained incomplete at their 3’ regions), the 1595-bp isoform exceptionally included three potential ORFs (Figure1).

Figure 1
figure 1

Structure of the Euplotes raikovi msrAB gene. Nucleotide sequence: telomeric repetitions, italics; the 5’ and 3’ non-coding regions, lower case letters; coding regions, capital letters; in-frame TGA codons, underlined; ATG, TAA and TAG start and stop codons, shadowed. Deduced amino acid sequence: blue, green and red letters distinguish the putative proteins encoded by the three ORFs.

The first ORF (ORF-1), spanning from ATG at position 763 to TAA at position 1269, matched the ORF of the three other gene sequences (Additional file 2: Figure S1). It encodes a 168-amino acid MsrA protein showing a much closer structural identity (79_90%) to MsrAs of Rhodobacterales, such as Thalassobacter (re-classified as Litoreibacter[9]) and Oceanicola, and Rhizobiales such as Sinorhizobium, than to any eukaryotic MsrA those of ciliates such as Tetrahymena and Paramecium included (Figure2).

Figure 2
figure 2

Sequence alignment of the Euplotes raikovi MsrA protein (red) encoded by ORF-1 of the msrAB gene with MsrAs of other organisms. MsrAs included in the alignment represent the best hits obtained from prokaryotic and eukaryotic BLASTp searches. Gaps were inserted to maximize alignment, and identical residues are highlighted in gray. Numbers in brackets indicate the percentage of sequence identity of each amino acid sequence with E. raikovi MsrA. Aligned sequences have the following GenBank ID: Thalassobacter arenae, WP_021102447; Sinorhizobium fredii, YP_006401320; Oceanicola sp., WP_010137233; Ruegeria lacuscaerulensis, WP_005979692; Rhizobium sp., WP_018236324; Sphingopyxis sp., WP_003045039; Roseibium sp., WP_009759924; Pelagibaca bermudensis, WP_007796742; Citreicella sp., WP_008887323; Nitratireductor aquibiodomus, WP_007008964; Pantholops hodgsonii, XP_005978873; Paramecium tetraurelia, XP_001431627; Tetrahymena thermophila, XP_001020577. Rhodobacterales, green; Rhizobiales, brown; Sphingomonadales, blue; eukaryotic organisms, black.

The second ORF (ORF-2), spanning from ATG at position 305 to TAG at position 1150 and partially overlapping with ORF-1, includes (at the beginning of the overlapping region) an in-frame TGA codon which, however, is most likely not committed to stop translation. At least in principle, it should code for cysteine, or selenocysteine, so as TGA usually does in Euplotes[10]. The 152-amino acid N-terminal region of the 281-amino acid sequence encoded by this ORF shows significant relationships not with other MsrA proteins, but with bacterial MsrBs lacking Cys-Xxx-Xxx-Cys Zn-ion binding motifs [3],[4]. Its alignment is much closer (72_78% of structural identity) to MsrBs of Rhodobacterales such as Roseovarius, Roseobacter, Thalassobacter and Oceanicola, and Rhizobiales such as Sinorhizobium and Rhizobium, than to any eukaryotic MsrB including the MsrB of E. raikovi itself (Figure3).

Figure 3
figure 3

Sequence alignment of the 153-amino acid N-terminal region of the Euplotes raikovi MsrB protein (red) encoded by ORF-2 of the msrAB gene with MsrBs of other organisms. The MsrBs included in the alignment represent the best hits obtained from prokaryotic and eukaryotic BLASTp searches. Gaps were inserted to maximize alignment, and identical residues are highlighted in gray. Numbers in brackets indicate the percentage of sequence identity of each amino acid sequence with E. raikovi MsrB. Aligned sequences have the following GenBank ID: Roseovarius nubinhibens, WP_009814088; Roseobacter sp., WP_007811995; Sinorhizobium meliloti, WP_018098563; Rhizobium sp., WP_018236325; Nitratireductor aquibiodomus, WP_007008963; Thalassobacter arenae, WP_021102446; Sphingopyxis sp., WP_003044951; Oceanicola granulosus, WP_007254905; Roseibium sp., WP_009759925; Mesorhizobium alhagi, WP_008840482; Ruegeria conchae, WP_010442903; Pantholops hodgsonii mitochondrial-like, XP_005955290; Euplotes raikovi, AFZ61875; Paramecium tetraurelia, XP_001426263; Tetrahymena thermophila, XP_001019714. Rhodobacterales, green; Rhizobiales, brown; Sphingomonadales, blue; eukaryotic organisms, black.

The third ORF (ORF-3), spanning from ATG at position 78 to TAA at position 305 and containing another in-frame TGA, encodes a 75-amino acid protein not related to Msr proteins. Its 40-amino acid N-terminal segment is 55_60% identical to the C-terminal sequence of the LysR-type transcription regulator of Rhizobium, Sinorhizobium, and Sphingopyxis. In addition to being strongly conserved among Rhizobiales and Sphingomonadales [11],[12], this regulatory protein is known to be determined by genes carried by DNA regions destined to be transferred from one to another bacterial genome [11].

To obtain evidence that the msrAB gene is a functional and effectively expressed gene, cDNA preparations were obtained from cells previously induced to increase their anti-oxidative enzyme synthesis by a mild oxidative stress (generated by a 30-min suspension with 300-μM H2O2 concentration), and subjected to PCR amplification with primer combinations specific to each ORF (Additional file 3: Figure S2). Two MsrA-specific 368-bp and 660-bp products were obtained, indicating that ORF-1 is either the only one to be expressed, or is expressed to a much higher extent than the other two ORFs.

The bacterial origin of the three ORFs of the E. raikovi macronuclear msrAB gene is well explained by a comparative analysis with the organization of the MsrA, MsrB, and transcription-regulator gene sequences in Thalassobacter arenae, Sinorhizobium meliloti and Sphingopyxis alaskensis genomes [13]-[18]. In all these Alphaproteobacteria, the MsrB and MsrA coding genes lie adjacent to one another and the TGA stop codon of the MsrB coding region partially overlaps (T. arenae), or is separated by none (S. meliloti), or only two nucleotides (S. alaskensis) from the ATG start codon of the MsrA coding region (Figure4 and Additional file 4: Figures S3-S5). In addition, in T. arenae and S. alaskensis the transcription-activator gene is located apart from the Msr coding genes [13],[18]. In S. meliloti, instead, the distance is only 114-bp from the ATG of the MsrB coding region [14] and the MsrA/MsrB/transcription-activator gene cluster does not lie in the chromosome, but in one of the two symbiotic mega-plasmids (or chromids) [14]-[17].

Figure 4
figure 4

Comparative structural analysis of the Euplotes raikovi msrAB gene with the MsrA and MsrB coding genes of Thalassobacter arenae , Sinorhizobium meliloti , and Sphingopyxis alaskensis. ORFs are represented by arrows pointing to the direction of transcription and extending between the indicated nucleotide positions. Red, green and blue colors highlight MsrA, MsrB and LysR-transcription regulator ORF, respectively. Gray and black bars indicate regions of bacterial genes with 70 and 73-75% of nucleotide sequence identity with msrAB gene, respectively (see also Additional file 4: Figures S3-S5). Inter-ORF bars indicate non-coding regions and their relative extensions, while the filled boxes in the msrAB gene indicate the telomeric ends. T. arenae, S. meliloti and S. alaskensis sequence GenBank accession numbers are GCA_000442275.1, CP003936.1 and CP000356.1, respectively.

Conclusions

Genome analysis from a large variety of pro- and eukaryotes indicates that gene transfer among the three domains of life is a recurrent phenomenon in biological evolution. It also suggests that eukaryotic genomes preferentially retain those prokaryotic genes which encode enzymes capable of conferring adaptive and evolutionary advantages [19]-[21]. The finding that E. raikovi uses Msrs from Alphaproteobacteria to repair methionine-oxidized proteins supports these concepts, and implies that ciliates in general expand their genetic resources from the acquisition of bacterial gene sequences.

The pervasive tendency of Euplotes species to host endosymbiotic bacteria in their cytoplasm [22], and the fact that Rhizobiales include numerous symbiotic species [23] would suggest that the origin of the msrAB coding sequence lies in some Sinorhizobium species living as endosymbionts in E. raikovi. However, present-day stable cytoplasmic hosts of E. raikovi appear to be Gammaproteobacteria, in primis Francisella endociliophora[24],[25], which have Msr genes with sequences markedly different from those of the E. raikovi msrAB coding sequences (personal communication from Dr. Andreas Sjdin, CBRN Defence and Security Department, Swedish Defence Research Agency, Ume).

An alternative hypothesis accounting for the origin of the msrAB gene is suggested by the Doolittles aphorism you are what you eat [26]. It considers that the origin of the msrAB gene resides in some Rhodobacterales or Rhizobiales species that are usually ingested as food by E. raikovi. Molecular investigations and cultivation-based studies have consistently revealed that both Rhodobacterales of the so-called marine alpha group and Rhizobiales of the genus Rhizobium are cosmopolitan and dominant members of microbial communities in marine sediments [27]-[31]. Furthermore, they contribute to the Mediterranean subsurface microbial community of which E. raikovi is a common member [32].

Methods

Cell cultures

Euplotes raikovi cultures used in this study derive from the wild-type strain #13 deposited at the ATCC Center (catalog, #PRA-327), and collected (June 1979) from a sandy coastal site (Porto Recanati, 43 26’N, 13 43’E) of the Adriatic cost of Italy [32]. They were fed on green algae, Dunaliella tertiolecta, grown in pasteurized natural seawater enriched with Walne medium.

DNA purification and amplification

Total DNA preparations were obtained, according to a published procedure [33], from cultures deprived of food for 34 days and concentrated by centrifugation (2500 x g, for 5min). Degenerate primers were designed with the CODEHOP (Consensus-Degenerate Hybrid Oligonucleotide Primers) method [34] on the basis of the following two MsrA conserved sequence stretches: Leu-Ala-Gly-Gly-Cys-Phe-Trp and His-Asp-Pro-Thr-Thr-Leu-Asn-Arg-Gln-Gly. All the PCR amplifications were run in an Eppendorf Mastercycler (Eppendorf, AG, Hamburg, Germany), using 0.5-μg DNA aliquots as template in 50 μl-reaction mixtures containing 0.25 μM of each primer, 0.3mM dNTP, 1x buffer, and 1U of Perfect-Taq DNA Polymerase (Eppendorf). After an initial DNA denaturation step at 95C for 4min, 35cycles of 95C for 30sec, 58C for 40sec, and 72C for 1min were run. A final incubation step, at 72C for 7min, was added to the last cycle. Gel-purified PCR products were ligated into pGEM-T Easy Vector (Promega, WI) and transformed into TOPO 10 cells (Invitrogen, Life Technologies Corporation, Carlsbad, CA, USA). Colonies were selected for PCR amplification to screen the presence of inserts using standard M13 primers and the products were sequenced at the BMR Genomics Center of the University of Padua.

RNA extraction and cDNA synthesis

RNA was extracted from cells incubated with H2O2 (300 μM), for 30min, harvested by centrifugation, and lysed in Trizol reagent (Ambion, Life Technologies Corporation, Carlsbad, CA, USA). It was then purified with the PureLink RNA mini kit (Ambion) following the procedure described by the manufacturer, and digested with RNAse-free DNAse I to remove contaminating DNA. Single-stranded cDNA was synthesized following the 3’ RACE protocol of the FirstChoice RLM-RACE kit (Ambion) and 50ng-aliquots were next used in PCR analysis.

Sequence analysis and accession number

BLAST analysis (http://www.ncbi.nlm.nih.gov/BLAST) and ClustalW (http://www.genome.jp/tools/clustalw) were used to search for the nearest relative sequences and perform multiple sequence alignments, respectively. The msrAB sequence has been deposited to GenBank under the accession number KM197136.

Authors contributions

ND and AV conceived the study. PL prepared the biological material for the experiments. ND, AC and FR carried out the experiments. AV analyzed the data. AV and PL wrote the manuscript. All the authors have read the article and approved the final manuscript.

Additional files