Introduction

The Synurophyceae, a class of photosynthetic stramenopile (or heterokont) algae, is a morphologically diverse lineage with plastids derived from red algae via secondary or tertiary endosymbiosis. They are motile organisms with two parallel emergent flagella, one or two plastids, a cell coat in which the siliceous scales cover the entire cell, and lack of an eyespot [1, 2]. Synurophyceae are presently assigned to one of three genera: Mallomonas, Neotessella and Synura. Members of the genus Synura are colonial flagellates characterized by cells having two visible and unequal flagella, two plastids, and an external covering of siliceous scales. Members of the single celled genus Mallomonas have silica scales and bristles, while the colonial genus Neotessella is characterized by an oval-shaped scale structure and a single scale case that surrounds the whole colony.

The presence of four membranes surrounding Synurophyceaen plastids provides direct evidence for the hypothesis that their plastids are derived by eukaryote-eukaryote endosymbiosis, a process that is thought to have given rise to photosynthesis in several other protist lineages (e.g., cryptophytes, haptophytes, euglenoids, chlorarachniophytes; [3,4,5]). The outermost membrane of synurophycean plastids is continuous with the endoplasmic reticulum (referred to as the chloroplast ER; CER), but unlike the red-algal derived plastids of other ‘chromists’ such as haptophytes and cryptophytes, a linkage between this membrane and the outer nuclear envelope is either totally lacking or marginal [1]. The silica deposition vesicles (SDVs), in which siliceous scales form, are produced from the CER on the outer side of the plastid. The mature scales are brought to the cell surface via Golgi body vesicles and placed in position alongside pre-existing scales [6, 7].

Molecular sequence datasets that include combinations of nuclear, plastid, and mitochondrial genes have provided insight into the branching order amongst the three recognized synurophyte genera and their phylogenetic relationship to other algae [8,9,10]. Recently, the plastid genome of the chrysophycean alga Ochromonas sp. CCMP1393 was reported [5]. The Ochromonas species genome was ‘conservative’ in possessing a large single copy region (LSC), a very short single copy region (SSC), and two inverted repeats (IR) with 15 functional protein-coding genes and ribosomal RNA operons. Recent phylogenomic studies of photosynthetic stramenopiles based on plastid genome data have focused mainly on Bacillariophyceae (diatoms) and Phaeophyceae (brown algae) with one or a few species from six additional classes—Bolidophyceae, Chrysophyceae, Eustigmatophyceae, Pelagophyceae, Raphidophyceae, Xanthophyceae—as well as plastid-bearing alveolates [5, 11]. While photosynthetic stramenopiles consist of at least 15 classes, phylogenetic relationships amongst them, including Synurophyceae, are still unresolved. Investigation of organellar genome structure and coding capacity from new protists has the potential to complement phylogenetic analyses by reinforcing observed relationships and helping to resolve phylogenetic issues.

Lateral gene transfer (LGT; also known as horizontal gene transfer) is the movement of genetic material from one species into the genome of an unrelated species. LGT provides a potentially important source of genetic variation in mitochondrial and plastid genomes, many of which display intron gain/loss and the presence of chimeric genes created by gene conversion. LGT appears to be rare in algal plastid genome but a few probable cases of bacterial derived genes have nevertheless been documented. These include the leucine biosynthesis (leuC/D) operon and RuBisCO genes of red algae [4, 12], the dnaX gene and group II introns in cryptophytes [13,14,15,16], the rpl36 gene in the haptophyte and cryptophyte plastid genomes [13] and the ebo operon in Eustigmatophyceae [17]. In addition, the plastid genomes of several angiosperms show evidence for LGT of one or more genes from mitochondria to plastids [18,19,20]. Although in most cases the underlying mechanisms are not known, taken together these examples strongly suggest that LGT from bacteria to organelles and from one organelle to another can occur.

Here, we present five complete plastid genome sequences from the three morphologically distinct genera of synurophycean algae: Mallomonas, Neotessella, and Synura. To better understand the relationships among synurophytes as well as the broader insight into organellar genome evolution of plastids in stramenopiles, we performed comparative and phylogenetic analyses of these genomes in the context of publicly available plastid genome sequence data, including that of the chrysophycean alga, Ochromonas sp. CCMP1393. Our results reveal highly conserved features of plastid genomes amongst the synurophyceans. We also uncovered several examples of gene loss/gain, duplication and gene rearrangement. Our results provides important insights into the evolutionary history of organelle genomes via lateral gene transfer (LGT) from green-algal lineages into the Synurophyceae, as well as divergence time estimates using molecular clock approaches based on silica fossil records. Collectively, our data contribute to a better understanding of the evolutionary history of the Synurophyceae.

Results

General features of Synurophyceae plastid genomes

Five new plastid genomes (ptDNA) were sequenced from the synurophycean genera Mallomonas, Neotessella, and Synura (Table 1, Fig. 1). The structure and coding capacity of these ptDNAs were then compared to the published genome of the related chrysophycean alga, Ochromonas sp. CCMP1393 [5]. The plastid genome sizes of the Synurophyceae ranged from ~ 130 kbp (S. sphagnicola) to ~ 147 kbp (M. splendens) and the overall GC content ranged from 37.5 to 42.4%. The overall organization of the five synurophytes and Ochromonas sp. CCMP1393 was found to be conserved: they each contain a large single copy region, a very short single copy region, and two inverted repeats. The plastid genomes of Synurophyceae share a core set of 134 functional protein-coding genes including genes in the IR regions (Table 2). The plastid genome IR sequence length of the Synurophyceae and Ochromonas sp. CCMP1393 ranged from 22.5 kbp to 31.6 kbp with 15 functional protein-coding genes (ccs1, ccsA, chlI, petJ, petM, petN, psaC, psaM, psbA, psbC, psbD, rpl21, rpl27, rpl34, and secA), 3 rRNAs and 5 tRNAs. Introns and a pseudogene were found in the tRNA genes. The trnRUCU in S. petersenii, S. uvella, and M. splendens and trnEUUC in N. volvocina is present as a pseudogene; it has a low hidden Markov model score (HMM score = 0) and a secondary structure-only score (2Str Score < 40) predicted by tRNAscan-SE. Synura and Mallomonas have an intron in trnLUAA, whereas N. volvocina has introns in trnPUGG and trnSGCU (Table 3, Fig. 2). The five newly determined plastid genomes showed a high degree of structural conservation relative to the representative species, Synura petersenii (Fig. 1 and Additional file 1: Figure S2). Gene order among Synurophyceae and Ochromonas ptDNAs was conserved, with the exception of N. volvocina, which has two inversions that differ from other synurophycean species (Fig. 1).

Table 1 Characteristics of Synurophyceae plastid genomes analyzed in this study
Fig. 1
figure 1

Circular map of the plastid genome of Synura petersenii. The gene content and arrangement of the synurophycean plastid genomes examined herein are identical, with the exception of the six syntenic regions shown as (a-f). Regions of the N. volvocina genome are shown in gray. The protein coding genes, rRNA and tRNA genes are labeled inside or outside of the circle. The genes are color-coded according to the functional categories in the index

Table 2 List of genes in the synurophycean plastid genome
Table 3 tRNAs present in Synurophyceae plastid genomes.
Fig. 2
figure 2

tRNAs structures with intron sequences in synurophycean plastid genomes. Collectively, these tRNAs show outright gene losses, pseudogenization, and intron insertions. The nucleotides in red indicate anti-codon

Gene arrangements of IR and SSC regions

The synurophycean plastid genomes exhibit different gene order patterns in six distinct regions (Fig. 1a-e). First, three different gene order patterns were found at the boundaries between single-copy and repetitive regions (Fig. 1a). The most common pattern is that shared by Ochromonas sp. CCMP1393, N. volvocina and M. splendens, in which a particular block of genes (trnS-psaD-trnM-ycf36-petM-petN) lies between the dnaK and chlI genes at the IRA/LSC junction (Fig. 1a). Within this syntenic block, S. uvella and S. sphagnicola have one open reading frame (ORF) between ycf36 and petM, whereas S. petersenii is distinct in the loss of trnS-psaD-trnM-ycf36 (Fig. 1a). Second, four different gene order patterns were observed at the IR/SSC junction (Fig. 1b). Gene content in this region of the genome is conserved, but four genes (psaC, ccsA, psbA, trnN) are dynamically rearranged in the IR regions of each species. The gene rearrangements in the plastid genome of S. uvella are distinct from other synurophyceans. The IRB/LSC junctions of the synurophyte plastid genomes also showed three different gene order patterns (Fig. 1d). The first pattern, shared by Ochromonas sp. CCMP1393, M. splendens and S. petersenii, involves the ilvB gene linked to the trnS-psaD-trnM genes. It probably represents the ancestral gene order in the synurophycean algae. The second pattern is seen only in S. uvella, one in which there is an ORF in the plus orientation between ycf36 and petM. The third pattern is shared by N. volvocina and S. sphagnicola; it involves the presence of a partial dnaK gene (the duplicated region starts at the 440th amino acid) between the ilvB and trnS-psaD-trnM genes.

The small single copy (SSC) region is exceptionally short, ranging from 711 bp to 2939 bp, and includes only two protein-coding genes: ycf54 and psbY (Fig. 1c). The SSC flanking regions have slightly different patterns of gene duplication and location of the genes in each species. The psbY gene is located to the left side of ycf54 gene in S. petersenii and Ochromonas sp. CCMP1393, while it is located on the right side in M. splendens. Synura sphagnicola has duplicated psbY genes on both sides of ycf54, but the ycf54 gene is absent in plastid genome of N. volvocina. Synura uvella has duplicated psbY-ycf54 genes in extended IR regions.

Discussion

Expansions and contractions of IR region

The expansions and contractions of IR region have occurred frequently during the evolutionary history of Synurophyceae, leading not to changes in gene content, but to gene rearrangements and gene duplications in our results. Such events can alter gene order through inversion, expansion/contraction of the IR, gene duplication/loss, or transposition. IR boundary shifts are a common phenomenon, which is thought to be caused by inversions or recombination between repeated sequences resulting in gene order changes in plastid genomes [21, 22]. Contractions, expansions and small-scale changes in IR and SSC regions appear to be common in diatoms and green algae, leading to dynamic gene rearrangements and changes in gene content [23,24,25,26]. Rearrangements at the IR boundary is likely one of the factors contributing to the extensive genome rearrangements in the Synurophyceae as well.

Lineage specific gene gain and loss

Previous work has shown that in red-algal derived secondary plastids, most of the lineage-specific plastid genes show complex distribution patterns suggesting independent losses across a broad range of phylogenetic depths [16]. Although the plastid genomes of Synurophyceae and Ochromonas sp. CCMP1393 studied herein are generally highly conserved in structure and gene content, three genes were identified as being lineage specific: dnaB, syfB, and cemA (Fig. 1e-f). To understand the evolutionary distribution and phylogenetic relationships of these genes among eukaryotes, we performed phylogenetic analyses of homologs obtained from the plastid genomes of major photosynthetic eukaryotic groups with their cyanobacterial homologs.

The dnaB gene encodes a DNA helicase that is involved in organelle division [27, 28] and is found in the plastid genome of cryptophytes, some dinoflagellates (i.e., those with diatom-derived plastids), and specific subgroups of stramenopiles and rhodophytes (Additional file 2: Figure S3; [16]). Interestingly, dnaB in Synurophyceae was only found in the genera Synura and Mallomonas (Fig. 1e and Additional file 2: Figure S3). The synurophyte sequences branch at the base of the photosynthetic stramenopiles in the algal dnaB gene tree (Additional file 2: Figure S3). In stramenopiles, dnaB is present only in Bacillariophyceae (except Synedra acus), Phaeophyceae, Raphidophyceae, Triparma, Synurophyceae and Xanthophyceae, but absent in Pelagophyceae, Eustigmatophyceae, and Chrysophyceae [16]. Of particular note, the ‘dinotoms’ Durinskia baltica and Krptoperidinium foliaceum, which are dinoflagellates harboring a diatom endosymbiont, contain a dnaB gene in their plastid genomes [29]. The cryptophytes, which also harbor a complex red-algal derived plastid, branch with the main red algal lineage including Galdieria sulphuraria. The dnaB gene appears to have been present in the plastid genome of the red algal common ancestor; if it was present in the common ancestor of all primary plastid-bearing algae, it was lost in green algae and glaucophytes, and independently in many complex red-algal derived plastid genomes (Additional file 2: Figure S3).

The syfB gene encodes the β subunit of phenylalanyl-tRNA synthetase [30]. While the syfB and syfH genes are retained in primary plastid-bearing organisms, the syfH gene is absent in complex red-algal plastid genomes. Furthermore, syfB has been lost in almost all red algl-derived plastid genomes, with the exception of the diatom Triparma and Chrysophyceae in stramenopiles [5, 11, 24, 31], as well as rhodophytes [32]. In the Synurophyceae and Chrysophyceae, syfB remains in most species, but is not present in S. petersenii and S. sphagnicola, suggesting that it was lost recently in a common ancestor shared by these two species (Fig. 1e, and Additional file 3: Figure S4). The dinotoms Durinski baltica and Krptoperidinium foliaceum also have a syfB gene in their plastid genomes. It is likely that the syfB gene has a similar history as dnaB, i.e., being ancestrally present and lost independently in specific groups.

Lateral gene transfer from green algae into photosynthetic Stramenopiles

The cemA gene encodes a chloroplast inner membrane protein [33]; it is conserved in almost all green algae, liverworts, land plants and rhodophytes, but is not found in glaucophytes [34, 35]. In the green algal order Bryopsidales, the cemA gene appears to have been lost in two clades [36]. Among the groups of algae with red-algal derived complex plastids, the cemA gene is thus far only found in cryptophytes and Synurophyceae (Figs. 1f and 3), and our phylogenetic analyses suggest that the cemA homologs in these two lineages have different origins: the cryptophyte protein shows a strong phylogenetic relationship with rhodophytes, whereas the synurophycean protein groups within Viridiplantae (Fig. 3). At face value, this is consistent with the hypothesis that the synurophycean plastid cemA gene was derived from a member of the green lineage through LGT. It is, however, not possible to be more specific than this; the synurophycean homologs are extremely divergent and branch sister to long-branching cemA proteins in streptophytes, rather than those of chlorophytes.

Fig. 3
figure 3

Phylogenetic tree based on cemA protein sequences. Numbers on branches are IQ-Tree UFBoot values. The scale bar shows the inferred number of amino acid substitutions per site

Evolution of the trnLUAA intron

The trnLUAA group I intron of algae is thought to have been acquired from the ancestral cyanobacterial endosymbiont that gave rise to the plastid. The existence of related introns in the trnLUAA gene has been reported in most green algal plastid genomes, as well as some stramenopiles [37]. A phylogenetic analysis of the intron suggests that it was present in the cyanobacterial ancestor of the three primary plastid-bearing lineages i.e., Rhodophyta, Viridiplantae, and Glaucophyta (Additional file 4: Figure S5). The trnLUAA group I intron is absent from red, cryptophyte and haptophyte algae, and found only in some stramenopiles, i.e., Phaeophyceae, Phaeothamniophyceae, Xanthophyceae, and Eustigmatophyceae ([5, 38, 39], this study). Given the high degree of intron sequence similarity between these four subgroups of stramenopiles, the trnLUAA gene is probably derived from the same ancestral archaeplastidal sequence. One notable feature is the presence of a predicted group I intron in all trnLUAA and trnPUGG (Neotessella volvocina) genes in the synurophycean plastid genome (Fig. 2). The group I intron sequences are more closely related to homologs in the green algal lineage and chlorarachniophytes (which have a green algal secondary plastid), rather than other stramenopiles (Additional file 4: Figure S5). However, it is not possible to infer the origin trnL intron with certainty, as the structure of the trnLUAA group I intron tree is generally very poorly supported.

Evolutionary history of plastid genomes in Synurophycean algae

Phylogenomic analysis using 91 genes of plastid genome data showed a monophyletic, strongly supported (MLB = 100%) synurophycean clade; internal relationships among the three genera were also well resolved (Fig. 4). In our maximum likelihood (ML) phylogeny, the genus Neotessella is the deepest branching synurophycean lineage, with Mallomonas and Synura splitting off thereafter. Furthermore, our phylogenomic investigations show that the Synurophyceae form a strongly supported sister relationship with the chrysophytes (Fig. 4), which is congruent with previous multigene phylogenetic studies [9, 10, 40,41,42]. Interestingly, the chromerids V. brassicaformis and C. velia form a strongly supported sister relationship with Eustigmatophyceae in Fig. 4. This topology is consistent with recent studies suggesting that the eustigmatophytes could be the source of the chromerid plastid [5, 43].

Fig. 4
figure 4

Phylogenetic tree of synurophyte plastids. This tree was constructed using a dataset of 91 concatenated proteins (18,250 amino acids). The numbers on each node represent ultrafast bootstrap approximation (UFBoot) values (left) calculated using IQ-Tree and posterior probabilities (right). The bold branch indicates strong sipported values (ML = 100 / PP = 1.00). The scale bar indicates the number of substitutions/site

Synurophycean algae are characterized by the presence of distinctive siliceous scales that produce a highly organized covering around the cell [8, 44]. The fossil record is rich in Synurophyceae containing silicious scales and cysts, which are resting stages produced by species of the Synurophyceae as well as Chrysophyceae [8, 45]. According to Siver et al. [8], the Synurophyceae originated in the Jurassic, approximately 157 million years ago (Ma), with the clade containing Mallomonas and Synura diverging during the Early Cretaceous at 130 Ma.

Using molecular clock data and our plastid genome phylogenies, we inferred the timing of gene gains, losses, and rearrangements in the plastid genomes of the synurophycean lineage. N. volvocina is predicted to have lost the dnaB gene in the plastid genome between ~ 156 Ma and the present, after the major synurophycean lineages diverged (Fig. 5. ①). The syfB gene loss may have occurred during the Early Cretaceous at 130 Ma, after the divergence of colonial Synura and unicellular Mallomonas; this is inferred because the gene is found in the plastid genomes of Mallomonas, Synura, Neotessela and ochrophytes (Fig. 5. ②). The cemA gene, hypothesized to have been derived from a member of the Viridiplantae by LGT, and the intron of the trnLUAA (Synura and Mallomonas) or trnPUGG (N. volvocina) genes appear to have been acquired during the Jurassic approximately 156 Ma, before the divergence of the Mallomonas, Neotessela, and Synura genera (Fig. 5. ③-④). The psaD gene, located near dnaK in the LSC/IRA junction, appears to have been lost ~ 93 Ma before present because the gene is found in all genera except S. petersenii (Fig. 5. ⑤). The partial duplication of the dnaK gene near the ilvB gene in the SSC/IRB junction might have been duplicated or truncated recently given that the genes are present only in S. uvella and N. volvocina (Fig. 5. ⑤-⑥). The gene rearrangements in the IR regions and duplications/translocations in the SSC regions are the result of species-specific events (Fig. 5. ⑦).

Fig. 5
figure 5

Molecular timeline of synurophyte plastid genomes. Putative gene loss, LGT, and intron insertion events are mapped onto a schematic tree modified from the time-calibrated multi-gene phylogeny of Siver et al. [8]. The number at each node represents the mean divergence time (in millions of years)

Conclusions

We have sequenced five synurophyte plastid genomes from morphologically distinct genera: the colonial genus Synura, whose individual cells are covered with silica scales; the single-celled genus Mallomonas covered with silica scales and bristles; and the colonial genus Neotessella, whose entire colony is covered with a single, large silica case. The overall organization of the plastid genome shows a high degree of conservation among the five Synurophyceae and Ochromonas sp. CCMP1393, but N. volvocina has two inversions relative to the other synurophycean species. The IR and SSC boundaries are particularly variable from species to species. Instances of lineage specific gene loss/gain and intron insertions were also detected (e.g., cemA, dnaB, syfB, and trnL). The dnaB and syfB genes appear to have been lost independently in different synurophyceans. Both the trnL intron sequences and cemA gene of Synurophyceae appear most closely related to their counterparts in green algae, suggestive of LGT. However, their sequences are divergent and should thus be interpreted with caution. All things considered, the extent to which LGT has contributed to the plastid genomes of Synurophyceae and other algae remains to be seen. Multi-gene phylogenetic analyses show that Synurophyceae group together with Chrysophyceae among the stramenopiles. Combined with molecular clock data, our phylogenetic tree allows us to infer the timing of gene gains, losses, duplications and rearrangements in the plastid genome of the synurophycean lineage.

Materials and methods

Cultures and sequencing

Cultures of Neotessella volvocina CCMP1871 and Mallomonas splendens CCMP1872 were obtained from the Culture Collection of the National Center for Marine Algae and Microbiota (NCMA). Three species of Synura were collected from natural habitats: Synura petersenii from Sweden (36° 30′ N, 126° 47′ E), Synura sphagnicola from Cheongyang, Korea (36° 30′ N, 126° 47′ E), and Synura uvella from Gahang, Korea (35° 30′ N, 128° 23′ E). S. petersenii strain S114.C7 has been deposited as CAUP B713 in the Culture Collection of Algae of Charles University in Prague, Czech Republic. The strains S. sphagnicola FBCC200022 and S. uvella FBCC200023 are available from the Freshwater Bioresources Culture Collection at the Nakdong-gang National Institute of Biological Resources Korea, respectively. All freshwater cultures were grown in DY-V medium [46] with distilled water and were maintained at 17 °C under conditions of a 14:10 light:dark cycle with 30 μmol photons·m− 2·s− 1 from cool white fluorescent tubes. All cultures were derived from a single-cell isolate for unialgal cultivation. Total genomic DNAs were extracted using the QIAGEN DNEasy Blood Mini Kit (QIAGEN, Valencia, CA, USA) following the manufacturer’s instructions. Next-generation sequencing was carried out using the MiSeq (Illumina, San Diego, CA, USA). The amplified DNA was fragmented and tagged using the NexteraXT protocol (Illumina), indexed, size selected, and pooled for sequencing using the small amplicon targeted resequencing run, which performs paired end 2 × 300 bp sequencing reads using the MiSeq Reagent Kit v3 (Illumina), according to the manufacturer’s recommendations.

Assembly and annotation of plastid genomes

Sequence data were trimmed, assembled using the SPAdes 3.7 assembler (http://bioinf.spbau.ru/spades), and mapped to the assembled contigs. The contigs were deemed to be of plastid genome origin as follows: (1) BLAST searches against the entire assembly using commonly known plastid genes as queries resulted in hits to these contigs [47, 48] and (2) the predicted genome sizes were similar to the previously published 127 kbp plastid genome of Ochromonas sp. CCMP1393 (KJ877675). For each genome we verified the sequence and structure of both inverted repeat positions and SSC regions with specific primers using standard Sanger sequencing (Additional file 5: Figure S1). Annotation of protein coding genes, rRNA genes, and tRNA genes were identified using data from all previously sequenced synurophycean plastid genomes according to the methods described in Kim et al. [16]. Genome sequences were deposited to the NCBI GenBank database under the accession numbers shown in Table 1.

Phylogenetic analysis

Phylogenetic analyses were carried out on amino acid sequence datasets created by combining 91 protein coding genes from 99 plastid genomes (Additional file 6: Table S1). The sequences of six Viridiplantae and one glaucophyte species were used as outgroup taxa for rooting purposes. The datasets were aligned and concatenated (18,250 amino acid sequences) using MacGDE2.6 [49].

ML phylogenetic analyses of individual protein alignments and concatenated alignments were conducted using IQ-TREE Ver. 1.5.2 [50] with 1000 bootstrap replications. The best evolutionary model for each tree was automatically selected using the –m LG + I + G option incorporated in IQ-TREE. RAxML version 8.0.0 [51] with the general time-reversible plus gamma (GTR + GAMMA) model was used for nucleotide data of the intron within trnLUAA. The model parameters with gamma correction values and the proportion of invariable sites in the combined dataset were obtained automatically by the program. ML bootstrap support values (MLB) were calculated using 1000 replicates with the same substitution model. Bayesian analyses were carried out using MrBayes 3.2.6 [52] with two simultaneous runs (nruns = 2). Four Metropolis-coupled Markov chain Monte Carlo (MC3) chains ran for 2 × 106 generations, sampling every 1000 generations. The burn-in point was determined by examining the trace files using Tracer v.1.6 (http://tree.bio.ed.ac.uk/software/tracer/). This analysis was repeated twice independently, and both analyses resulted in the same tree. The trees were visualized using the FigTree v.1.4.2 program, available at http://tree.bio.ed.ac.uk/software/figtree/.