Background

The order Oedogoniales within the single family Oedogoniaceae includes three genera: Oedogonium Link ex Hirn, Oedocladium Stahl, and Bulbochaete Agardh [1,2,3,4]. More than 600 species have been described in this order, most of which can be found in fresh waters throughout the world, although Oedocladium species are mainly found on soil surfaces, a few species of Oedogonium are found in moist soil surfaces [4,5,6,7,8,9,10,11,12,13,14]. The presence of branches and hairs are the genus-level characteristics to distinguish this order; Oedogonium has simple and unbranched filaments, Bulbochaete has bulb-based hairs, and Oedocladium has branched filaments [5,6,7,8,9,10,11,12,13,14]. While some molecular phylogenetic studies on Oedogoniales have suggested that both Oedogonium and Oedocladium do not appear to be monophyletic, and the morphological criteria of Oedogoniales do not define natural groups, making its evolutionary position unclear [15,16,17,18,19,20].

Chloroplast (cp) genomes have been found to be ideal for phylogenetic analysis and molecular evolution studies owing to advantages such as low evolution rate and maternal inheritance [21,22,23,24,25], and plastome has been increasingly used for phylogenetic and evolutionary studies of green algae. For example, Claude Lemieux et al. [26] conducted cp phylogenetic analysis based on the cp genes of 61 chlorophytes and revealed that Trebouxiophyceae is not monophyletic. Zhang et al. [27] demonstrated the adaptive mechanism of sea-ice environment by analyzing the molecular evolution of an Antarctic sea ice alga Chlamydomonas sp. based on cp protein-coding genes. However, only four cp genomes of Oedogoniales are currently available in public databases [28, 29], restricting the phylogenetic analysis and molecular evolution studies based on cp genomes of this group.

Nucleotide substitution rates are often used as the criterion to reflect selection pressure. While nonsynonymous substitution rates (dN) can cause amino acid change, synonymous substitution rates (dS) do not cause amino acid change. The dN/dS ratio is the measure of natural selection acting on the protein. According to Yang [30], dN/dS < 1 denotes negative purifying selection, dN/dS = 1 signifies neutral evolution, and dN/dS > 1 indicates positive selection [31]. As most of the plastid protein-coding genes undergo negative or purifying selection to maintain their function, they are conserved and have a low dN/dS ratio. However, some genes might undergo positive selection in response to environmental changes, consequently presenting relatively high dN/dS ratio [32,33,34].

In this study, the cp genomes of five Oedogonium species, were sequenced and an in-depth analysis of these genomes, including comparative analysis with previously reported Oedocladium and Oedogonium cp genomes, was performed. Furthermore, phylogenetic analysis and evolutionary study of the order Oedogoniales were conducted based on cp protein-coding genes and a positively selected gene was identified in Oedocladium species. The results of this study could be useful to understand the phylogenetic and evolutionary relationships of Oedogoniales.

Results

Species identification

The characteristics of the five Oedogonium taxa were list in Table 1. And the sizes of these characters of each taxa were list in Supplementary. Light microscopy of the five Oedogonium taxa were in Figs. 1 and 2. For strain FACHB-3309 (Fig. 1A-D), the main features were almost the same with Oe. dentireticulatum Jao [9], except for the apical and basal cells were not observed, and both the samples showed the same locality in Chongqing Province in China, it was identified as Oe. dentireticulatum Jao morphologically. For strain FACHB-3310 (Fig. 1E–I), it was identified as Oe. crispum (Hassall) Wittrock for the length-width ration was different from the similar Oe. autumnale Wittrock with larger length-width ration, and it was also different from the similar Oe. obesum (Wittrock) Hirn with the oospore nearly or completely filling the oogonium instead of not filling the oogonium [9, 14]. Strain FACHB-3313 (Fig. 2A–E) was identified as Oe. Capilliforme Kuetzing, Wittrock [9, 14], it differed with the similar Oe. plagiostomum Wittrock not with the thickened spore wall. With regard to strains FACHB-3311 (Fig. 2F) and FACHB-3313 (Fig. 2G), the entire sexual features could not be observed; however, the filaments of both of these strains were unbranched, indicating that they obviously belonged to the genus Oedogonium. In particular, strain FACHB-3313 exhibited unbranched rhizoids that resembled those of Oedocladium.

Table 1 Matrix of phenotypic traits scored for the five Oedogoniales strains. Character state definitions are below, unknown character states are notated as “?”. Polymorphic conditions are indicated with multiple state numbers
Fig. 1
figure 1

Photos of habitat and light microscopy of Oe. dentireticulatum and Oe. crispum. A-D Oe. dentireticulatum. A Showing the unbranched filament with oogonium, dwarf males and androsporangia. B Showing the dwarf male with two seriate and the oogonium. C Showing the median pore. D Sowing the oospore with dentate teeth. E-I Oe. crispum. E. Showing the unbranched filament with oogonium and antheridium. F Showing the sperms division horizontal, sperms 2. G Showing the oogonium single, obovoid-globose, operculate, division superior. H Terminal cell apically obtuse. I Elongated basal cell

Fig. 2
figure 2

Photos of habitat and light microscopy of Oe. capilliforme and two undefined species. A-E Oe. capilliforme. A Unbranched female filament with oogonium single or 2-continues, oogonium with superior pore. B Oogonium with median pore. C Antheridium in 2–7 in a series, with sperms 2, division horizontal. D Apiculate terminal cell. E Elongated basal cell. F Oe. sp. (strain FACHB-3311), showing the unbranched filament with young oogonium. G Oe. sp. (strain FACHB-3313), unbranched filament with rhizoid (4% formaldehyde fixed sample). Scale bars: 20 μm

General characteristics and comparison of Oedogoniales cp genomes

Table 2 summarizes the cp genomes characteristics of the five newly included Oedogonium species, three reported Oedocladium taxa and one Oedogonium species. The complete cp genomes of the nine species of Oedogoniales ranged from 146,367 bp (Oe. crispum) to 204,438 bp (O. carolinianum) in length. All of the five Oedogonium cp genomes displayed typical circular mapping with a large single copy (LSC) region (76,475–98,887 bp), a small single copy (SSC) region (43,305–58,055 bp), and two inverted repeats (IR) regions (12,808–35,492 bp) (Supplementary Figs. S1, S2, S3, S4, S5). The overall AT content in each cp genome was comparable and showed a little difference among the species, ranging from 69.98% (strain FACHB-3311) to 72.66% (O. prescottii); besides, difference was noted in coding proportion, which varied from 51.4% (O. carolinianum) to 69.5% (O. prescottii). The cp genomes of six Oedogonium species were moderately compact relative to those of the Oedocladium species. All the cp genomes contained 68 protein-coding genes and three rRNA genes, except for the cp genome of Oe. cardiacum, which have two additional genes (dpoB and int) located in the IR region. With respect to tRNA, the cp genomes showed slight difference as follows: Oe. cardiacum exhibited two additional trnR(ccu) located in the IR regions and Oe. dentireticulatum (strain FACHB-3309) presen; ted an additional trnR(ccu) in the LSC region; both Oe. crispum and Oe. sp. (strain FACHB-3313) contained two additional trnR(ccg) in the IR regions and O. carolinianum has an additional trnR(ccg) in the LSC region; and O. carolinianum has an additional trnS(gga) in the LSC region. Sequence repeats of more than 30 bp were less frequent (3.9–4.9%) in the cp genomes of the five Oedogonium species when compared with those in the two O. carolinianum cp genomes, but were more frequent, when compared with those in the Oe. cardiacum cp genome.

Table 2 General features of nine oedogonialean chloroplast genomes

Introns content and insertion sites

The introns content and insertion sites of the nine Oedogoniales cp genomes are listed in Table 2 and Supplementary Tables S1 and S2. The nine cp genomes significantly differed with respect to the introns content. Oe. sp. (strain FACHB-3311) has the maximum introns content with 26 group I introns and 11 group II introns. When compared with the other six Oedogonium cp genomes, multiple intron losses were observed in the cp genome of Oe. crispum (strain FACHB-3310), with four group I introns in trnL(uaa), psbC, atpA, and psbD, respectively, and four group II introns in psbI, petD, psaC, and psaB, respectively. Besides, similar to O. prescottii, Oe. crispum also exhibited introns losses in psbA. Oe. sp. (strain FACHB-3311) presented two additional group II introns in chlB and chlL, introns were first observed in them. All the nine cp genomes included group I introns in trnL(uaa), which is common across all algal lineages and is considered to originate from the common ancestor of cp [35]. The nine cp genomes showed a certain variation in insertion sites. The common group I introns in trnL(uaa) and group II introns in petB, psaC, and psbI (only strain FACHB-3311 lost the intron in psbI) showed the same insertion sites. With regard to the other genes with introns, the insertion sites in different species showed similarities and variations. For instance, in psbA, the number of introns (introns in psbA are all group I) differed among the species, whereas the insertion sites of the first intron in Oe. dentireticulatum (strain FACHB-3309), Oe. sp. (strain FACHB-3311), and Oe. sp. (strain FACHB-3313) were identical. The two O. carolinianum were the same; however, the insertion site of the first intron in Oe. capilliforme was similar to that of the fourth intron in Oe. dentireticulatum and sp. (strain FACHB-3311).

Synteny analysis and average nucleotide identity analysis

ProgressiveMauve was used to analyze the Oedogoniales cp genomes synteny, with Oe. cardiacum used as a reference to compare gene order among the cp genomes (Fig. 3). More than 19 locally collinear blocks (LCBs) were identified in the cp genomes of the nine species of Oedogoniales, including six taxa from Oedogonium and three taxa from Oedocladium. The nine cp genomes showed high degree of syntenic conservation on the whole, with Oe. capilliforme exhibiting high similarity to Oe. cardiacum, and Oe. dentireticulatum resembling Oe. sp. (strain FACHB-3311). However, some rearrangements and inversions were still observed among certain short LCBs mainly owing to the inversion or loss of introns. The genes order and number were almost identical except for that an inversion between trnE(uuc) and petL with a length of less than 3 kb and including the genes petD and trnR(ucg) was detected in O. carolinianum (MT364369) and O. carolinianum (NC_031510).

Fig. 3
figure 3

Synteny comparison of Oedogoniales algae chloroplast genomes using progressiveMauve. The coloured syntenic blocks are local collinear blocks; blocks above the centre line indicate they are on the same strand, and blocks below the center line indicate they are on the opposite strand

The average nucleotide identity (ANI) of the nine species of Oedogoniales was calculated using FastANI (Supplementary Fig. S6). Oe. crispum showed high ANI with Oe. dentireticulatum and Oe. sp. (strain FACHB-3311) (90.64 and 90.56%, respectively), Oe. dentireticulatum was similar to Oe. sp. (strain FACHB-3311) with 92.57% ANI, and Oe. capilliforme was similar to Oe. cardiacum with 97.03% ANI.

IR expansion and contraction

The IR boundary regions of the nine species of Oedogoniales were compared as illustrated in Fig. 4. Oe. cardiacum and Oe. capilliforme (strain FACHB-3312) showed larger IRs reaching 35,000 bp, whereas Oe. crispum and O. prescottii exhibited smaller IRs reaching 13,284 and 12,808 bp, respectively. The IRs of all the nine cp genomes contained the same four protein-coding genes, three tRNAs, and three rRNAs. However, in Oe. crispum and Oe. sp. (strain FACHB-3313), an additional trnR(ccg) was observed between psbA and rbcL; Oe. cardiacum included two additional protein-coding genes (int and dpoB) and one tRNA (trnR(ccu)); and the IRa of four cp genomes included parts of the 5′-end of ccsA (390 bp in Oe. cardiacum, Oe. capilliforme, and O. prescottii and 383 bp in Oe. crispum).

Fig. 4
figure 4

Comparison of the IR-SC boundaries among nine Oedogoniales species

The nine Oedogoniales cp genomes showed high conservation at four regional boundaries, with little variation. The LSC/IRb junctions (JLBs) in the cp genomes of Oe. cardiacum, Oe. capilliforme, O. prescottii, and Oe. sp. (strain FACHB-3311) were located in trnR(ucu); as a result, 2 bp of the 3′-end of this gene were a part of the IR region. In Oe. sp. (strain FACHB-3311), the IR region contained 6 bp of the 3′-end of trnR(ucu), and in the other five cp genomes, the LSC/IRb boundaries occurred between trnR(ccu) and psbA. The IRb/SSC boundaries in all the nine cp genomes occurred between trnL(caa) and psaA, and the SSC/IRa junctions were located in rpoA. The IRa/LSC junctions (JSAs) of the two O. carolinianum cp genomes occurred between psbA and ccsA, while those of the other seven genomes were located in ccsA, with 390 bp of the 5′-end of this gene being a part of the IR region in Oe. cardiacum, Oe. capilliforme, and O. prescottii, and 383, 388, 608, and 389 bp of the 5′-end of this gene being a part of the IR region in Oe. crispum, strain FACHB-3313, strain FACHB-3311, and Oe. dentireticulatum, respectively.

Phylogenetic analysis and adaptive evolution analysis

Phylogenetic assays based on 54 cp protein-coding genes were conducted using maximum likelihood (ML) and Bayesian analyses with amino acid and nucleotide datasets, which generated two kinds of phylogenetic trees showing the same results (Figs. 5 and 6). Phylogenetic trees based on amino acid and nucleotide datasets both indicated that the nine species of Oedogoniales clustered into three clades Oe. sp. (MW250873) formed a separate clade with absolute high support value, the two O. carolinianum clustered together and formed another clade, and the other six Oedogoniales formed the third clade. With regard to the third clade, the two datasets showed a little difference in the location of O. prescottii. Based on nucleotide dataset, O. prescottii clustered with Oe. cardiacum and Oe. capilliforme, whereas according to the amino acid dataset, O. prescottii clustered with Oe. dentireticulatum, Oedogonium sp. (MW250875), and Oe. crispum. A total of 26 taxa, including the newly added five Oedogonium taxa, were included in the 18S rDNA phylogeny (Supplementary Fig. S7). The phylogenetic tree showed that species of Bulbochaete was separated with Oedogonium and Oedocladium with absolutely high support value, and the species of Oedocladium formed two branches separated by two species of Oedogonium, the five newly included Oedogonium species separated with each other distributed in the other small clades. All these results indicated that both Oedocladium and Oedogonium are polyphyletic, which is in accordance with that reported in a previous study [36].

Fig. 5
figure 5

Phylogenetic tree based on 54 chloroplast genes was generated by the amino acid data sets. Numbers on the left and right side at the branches represent Bayesian posterior probabilities and bootstrap values, respectively. Scale bar indicates substitutions per site

Fig. 6
figure 6

Phylogenetic tree based on 54 chloroplast genes was generated by the nucleotide data sets. Numbers on the left and right side at the branches represent Bayesian posterior probabilities and bootstrap values, respectively. Scale bar indicates substitutions per site

Based on the ML method of 54 chloroplast protein-coding genes, the value of dN and dS were compared between terrestrial and aquatic species of Oedogoniales. (Supplementary Table S3). No genes showed significantly different between the two group of algae at the levels of dN and dS. The ML method is a pairwise approach to estimate the dN/dS ratio, a dN/dS ratio may indicate in one or both species, and some specific sites under positive selection may remain undetected [37]. Positive selection analysis was performed based on branch-site model, and the null and alternative models were compared. The null model considered that the foreground branch only has dN/dS = 1, and the alternative model assumed that sites on the foreground branch have dN/dS > 1 (positive selection). When the two Oedocladium species and MW250875 were labelled as the foreground branch, the FDR-adjusted P value of psbA was less than 0.05 (Table 3). Based on Bayes empirical Bayes (BEB) assay, it indicated that psbA may possibly contain sites under positive selection, with 291SER showing posterior probability higher than 99%. However, owing to the lack of related functional sites information on closely related species such as Chlamydomonas reinhardtii and Stigeoclonium helveticum in UniProt, the positively selected sites of psbA require further investigation.

Table 3 Positively selected sites in terrestrial Oedogoniales species (Oedocladium species and Oedogonium sp. (MW250875))

Discussion

In this study, we investigated five Oedogonium isolates from China, of which strains FACHB-3309, FACHB-3310, and FACHB-3312 were identified as Oe. dentireticulatum, Oe. crispum, and Oe. capilliforme, respectively. Strains FACHB-3311 and FACHB-3313 were considered to belong to the genus Oedogonium owing to their unbranched filaments; however, they could not be identified at species level owing to their lack of entire sexual characters.

Comparative analyses of the nine Oedogoniales cp genomes showed highly conserved structures and gene numbers. The cp genomes of the newly sequenced five Oedogonium species were found to share the same structure as the previously reported Oedogoniales cp genomes, and the structures of the tetrad were not altered, but were different from the other two orders in the OCC clade (the IR is obliterated in the reported cp genomes in Chaetophorales and Floydiella of Chaetopeltidales). It has been indicated IR loss may be a synapomorphy marking the common ancestry of Chaetophorales and Chaetopeltidales [38]. The total length of these cp genomes was observed to vary within a relatively large range, extending from 146,367 bp (Oe. crispum) to 204,438 bp (O. carolinianum), which may be the result of contraction and expansion of IR regions and proportion of non-coding sequences, such as the introns content. Furthermore, the nine cp genomes showed highly conserved protein-coding genes and rRNAs number; however, they presented a slight difference in the tRNAs content. With regard to the introns content, the nine cp genomes exhibited relative variation, and the number of group I introns significantly differed, mainly owing to the diversity in the introns in psbA. In particular, introns (group II) were observed for the first time in chlB and chlL in Oe. dentireticulatum. All the nine cp genomes retained the group I introns in trnL(uaa) and group II introns in petD and psaC, and shared the same insertion sites. With regard to the other genes with introns, the insertion sites of different species showed similarities and variations.

Synteny analyses revealed a relatively high degree of syntenic conservation among the nine cp genomes, and only one inversed segment was detected in O. carolinianum FACHB-2453 and O. carolinianum UTEX LB 1686. The other variations were mainly owing to the introns, and no structural variation was observed in the six Oedogonium species. The results of FastANI also supported the findings of synteny analyses, indicating that Oe. capilliforme had high similarity with Oe. cardiacum, and Oe. dentireticulatum resembled strain FACHB-3311.

IR regions are the most conserved regions in the cp genomes. Frequent expansions and contractions at the junctions of SSC and LSC with IRs illustrate the relationships among the taxa and have been recognized as evolutionary signals [39,40,41,42,43]. The nine species of Oedogoniales examined in the present study showed only a few variations at the junctions. When compared with the two O. carolinianum, O. prescottii showed higher similarities to the five Oedogonium species, and the five Oedogonium species were similar to each other. The IR regions of O. prescottii and Oe. crispum presented a contraction, when compared with those of the other Oedogoniales taxa, and the cp genomes of both O. prescottii and Oe. crispum exhibited the shortest length. Previous studies have indicated that IR expansion and contraction frequently result in variations in genome size, which can be applied to phylogenetics and genome evolution analyses [40, 41, 44], and gene conversion during speciation is considered to be responsible for small IR expansions or contractions [40, 41, 45,46,47].

Phylogenetic studies based on 54 cp protein-coding genes assayed using ML and Bayesian analyses with amino acid and nucleotide datasets and 18S rDNA all showed that Oedocladium and Oedogonium are polyphyletic, which is in accordance with that reported previously. However, the support value based on nucleotide dataset and 18S rDNA was not very high at the basal node, probably owing to the lack of sufficient representative taxa for this group as well as different evolutionary rates of the amino acid sequence and nucleotide data. Previous studies have proposed that larger sample sizes can substantially improve the phylogenetic results [48].

Positively selected genes are known to play a key role in adaptation to different environments and speciation [49,50,51,52,53], and it is necessary to understand the adaptive evolutionary history of Oedocladium species. The results of the present study showed that 291SER of psbA may be under positive selection with posterior probability higher than 99%. The genus Oedocladium (terrestrial) is presumed to have partly originated from Oedogonium species, which grow on moist soil surface and present underground filaments with slightly unbranched rhizoids [9]. The psbA encodes the photosystem II reaction center protein D1, which is one of the two reaction center proteins of photosystem II. Photosystem II is the first link in the chain of photosynthesis, and captures photons and uses the energy to extract electrons from water molecules [54].It has been reported that the genes in the cp genome (including psbA) of Curcuma sp. show adaptive evolution to adapt to the changes in light conditions [55], and that the green alga Chlamydomonas sp. ICE-L underwent adaptive evolution to adapt to extreme polar environment [27]. We speculate that the Oedocladium species and terrestrial Oedogonium species could have partly originated from the aquatic Oedogonium species, and might have undergone adaptive evolution during this process to adapt to the difference in light intensity between aquatic and terrestrial habitats. Nevertheless, more genomic data, especially for terrestrial species, may help to verify these hypotheses and further understand the phylogenetic and evolutionary relationships of the order Oedogoniales.

Conclusion

The present study determined the cp genomes of five Oedogonium speciesand revealed that the overall structure and gene contents of the Oedogoniales cp genomes were relatively conserved, except for some variations in genome sizes, AT contents, introns, and repeats. Phylogenetic analysis based on 54 cp protein-coding genes and 18S rDNA genes all indicated that both Oedogonium and Oedocladium are polyphyletic. The positively selected gene in the two Oedocladium species was identified, and the terrestrial Oedogonium species were speculated to have undergone adaptive evolution to adapt to the difference in light intensity between aquatic and terrestrial habitats. These findings not only strengthen our understanding of Oedogoniales cp genomes, but also help us to comprehend the phylogenetic and evolutionary relationships of the order Oedogoniales.

Methods

Sampling, culture conditions, DNA extraction, and morphological observation

The strains described in this study were isolated from water or soil samples, and have been deposited to the Freshwater Algae Culture Collection at the Institute of Hydrobiology (FACHB collection), Wuhan, Hubei Province, China. Strain FACHB-3309 was collected from a paddy field in Hechuan (29°50′15″ N, 106°12′46″ E), Chongqing Province, China, in March 2019. Strain FACHB-3310 was collected from a pond in Lvliang (37°34′20″ N, 112°12′29.25″ E), Shanxi Province, China, in July 2018. Strain FACHB-3311 was collected from a pond in Wuhan (30°3′46″ N, 114°23′56″ E), Hubei Province, China, in June 2019. Strain FACHB-3312 was collected from a ditch in Wuhan (30°33′2″ N, 114°25′48″ E), Hubei Province, China, in April 2019. Strain FACHB-3313 was collected from damp soil in a park in Haikou (20°2′23″ N, 110°21′1″ E), Hainan Province, China, in December 2018. All the strains were grown at 25 °C in liquid BG11 medium under a 12/12-h light/dark cycle. The genomic DNA was extracted using a Universal DNA Isolation Kit (Axygen, Suzhou, China) [56]. An Olympus BX53 (Tokyo, Japan) light microscope equipped with an Olympus DP80 digital camera and CellSens standard image analysis software (Tokyo, Japan) were used for morphological examination. The characteristics of the five species were summarized in Table 1.

Library preparation, sequencing, genome assembly, and annotation

A NEB Next Ultra DNA Library Prep Kit for Illumina (New England Biolabs, Ipswich, MA, USA) was used for preparing sequencing libraries, which were sequenced on an Illumina NovaSeq 6000 platform by a commercial provider (Novogene, Beijing, China). The methods of genome assembly and annotation have been described elsewhere [36, 57]. The raw data were trimmed using SOAPnuke software [58] to remove the low-quality and the adapter sequences (the reads of the five species were with a mean length about 150 bp) and then assembled using SPAdes [59]. The resulting assembly contigs were considered to have originated from the cp genome if the (1) BLAST searches in publicly available cp genomes returned Chlorophyta species with significant e-values (1e-5); (2) GC content of the contigs was less than 45% (the GC content of previously sequenced green algal cp genomes is typically less than 45%); and (3) sequencing depth was more than 100-fold coverage. Subsequently, trimmed clean reads were aligned to the resulting assembly contigs using BWA-MEM [60]. If the reads mapped to two contigs, the order of the contigs was determined and one sequence was produced, which was confirmed by Sanger dideoxy sequencing. The cp genomes were initially annotated using GeSeq [61] with the reported Oedogoniales cp genomes as references. Protein-coding and ribosomal RNA genes were further polished using Blast with genes from the available Oedogoniales cp genomes. The tRNA genes were identified using tRNAscan-SE [62]. BLAST was used to refine the annotation results. Intron boundaries were determined by comparing intron-containing genes with homologs without introns, and intron subgroup affiliation was determined by modelling intron secondary structures [63, 64] using RNAweasel tool [65]. Forward and palindromic repeats larger than 30 bp were searched using Vmatch software (http://www.vmatch.de/) with the options -f -p -l -h -allmax and masked in the genome sequence by RepeatMasker (http://repeatmasker.org) running under the NCBI/RMBLAST (2.9.0+) search engine (http://blast.advbiocomp.com). The annotated sequences have been deposited to the NCBI GenBank database under the accession numbers MW250871–MW250875 (corresponding to strains FACHB-3309–FACHB-3313, respectively). Genome maps were generated using OrganellarGenomeDRAW [66].

Phylogenetic analysis

Phylogenetic analysis of the algal strains was performed by examining the sequences of cp protein-coding genes based on amino acid and nucleotide datasets and the 18S rDNA. The amino acid and nucleotide datasets of the cp genomes were concatenated using the following 54 protein-coding genes: atpA, atpB, atpE, atpF, atpH, atpI, cemA, chlB, chlL, chlN, clpP, petB, petD, petG, petL, psaA, psaB, psaC, psaJ, psbA, psbB, psbD, psbE, psbF, psbH, psbI, psbJ, psbK, psbL, psbM, psbN, psbT, psbZ, rpl14, rpl16, rpl2, rpl20, rpl23, rpl36, rpl5, rps11, rps12, rps14, rps18, rps19, rps2, rps3, rps7, rps8, rps9, tufA, ycf12, ycf3, ycf4. The amino acid sequences were aligned using MAFFT 7.0 [67], and those employed in the nucleotide dataset were additionally aligned using the MUSCLE function of MEGA7 [68] with the option “align codons” [69]. Ambiguous regions were removed from each alignment using trimAl 1.2 [70] with the option gt = 1. Evolutionary models and partitions of the datasets were determined using PartitionFinder 2 [71], and the best partitions are shown in Table 3. ML and Bayesian analyses were used for inferring phylogenies. IQ-TREE web server [72] was employed to perform ML analysis with 1000 ultrafast bootstraps [73] and 1000 SH-aLRT tests [73, 74] to examine nodal support. Bayesian analysis was conducted using MrBayesv3.2.6 [75], and the dataset was partitioned as shown in Table 4. Markov chain Monte Carlo analyses were run with four Markov chains (three heated, one cold) for 1,000,000 generations, and trees were sampled every 1000 generations. In each round of calculation, a fixed number of samples (burn-in = 1000) was discarded from the beginning of the chain. Reference sequences were downloaded from GenBank. 18S rDNA sequences were aligned using MAFFT 7.0 [67], and ambiguous regions were manually edited and adjusted by eye using MEGA7 [68]. Bayesian inference (BI) of the software program MrBayes v3.2.6 [75] was used, and an evolutionary model was determined using jModelTest2 with the best model was GTR + I + G [76]. An alignment of the cp genome sequences of all the species of Oedogoniales was generated using Mauve ver. 2.3.1 with the progressive mode [77]. FastANI [78] was employed to determine the ANI of all the cp genomes.

Table 4 Partition scheme of 54 concatenated chloroplast protein-coding genes used in this study

Evolutionary analysis

The CODEML program of PAML v4.9 [30] with the ML model (runmode = − 2, CodonFreq = 2) was used to measure the values of dS and dN, the analysis was based on 54 chloroplast protein-coding genes. Comparisons of the evolutionary rates were conducted using the two-tailed Wilcoxon rank sum test. The multiple testing was corrected by applying the false discovery rate method (FDR) [79].Branch-site model was utilized to find genes that possibly underwent positive selection. The improved branch-site model (model = 2, Nsites = 2) was used to detect signatures of positive selection on individual codons in a specific branch [80]. The three Oedocladium species and the terrestrial Oedogonium sp. (strain FACHB-3313) were set as the foreground branch. The null model assumed that no positive selection occurred on the foreground branch (fix_omega = 1, omega = 1), and the alternative model assumed that sites on the foreground branch were under positive selection (fix_omega = 0, omega = 2). LRT were used to test model fit and Chi-square test was applied for examining the P values. A correction was performed for multiple testing using an FDR criterion, and BEB method was employed to statistically identify sites under positive selection. Genes with an FDR-adjusted P < 0.05 were considered as putatively selected.