Chenopodiaceae is among the large families of angiosperms that mainly include Spinacia oleracea, Chenopodium quinoa Willd, and Beta vulgaris [1,2,3]. Chenopodiaceae plants are mostly annual herbs, half shrubs, shrubs, living in the desert, and saline soil areas. Therefore, they often show xerophytic adaptation. As an annual herb of Chenopodiaceae, S. glauca grows in saline-alkali land and beaches. It displays a strong salt tolerance and drought tolerance capacity and has high value as medicine and food material [4,5,6]. Moreover, S. glauca possesses immense ecological importance as it can tolerate heavy metals at higher levels and could be used as a super accumulator of heavy metals. The environmental protection and remediation of contaminated soil make it a natural resource with significant economic and ecological importance [7].

Plant mt is involved in numerous metabolic processes related to energy generation and the synthesis and degradation of several compounds [8]. Margulis’ endosymbiosis theory suggests that mt originated from archaea living in nucleated cells when eukaryotes swallowed the bacteria. Later it evolved into organelles with special functions during the long-term symbiosis [9,10,11], incorporated as an additional mt genome. Mitochondria convert biomass energy into chemical energy through phosphorylation and provide energy for life activities. Besides, it is involved in cell differentiation, apoptosis, cell growth, and cell division [12,13,14,15]. Therefore, mitochondria play a crucial role in plant productivity and development [16]. For most seed plants, nuclear genetic information is inherited from both parents, while cp and mt are inherited from the maternal parent. This genetic mechanism eliminates the paternal lines’ influence, thus reducing the difficulty of genetic research and facilitating the study of genetic mechanisms [17].

With the development of sequencing technology, an increasing number of mt genomes have been reported. Up to Jan. 2021, 351 complete mt genomes have been deposited in GenBank Organelle Genome Resources. Long periods of mutualism leave mitochondria with some of their original DNA lost, and some of them transferred, leaving only the DNA that codes for it [18, 19]. Mt DNA has long been recognized as tending to integrate DNA from various sources through intracellular and horizontal transfer [20]. Therefore, the mt genome in plants has significant differences in length, gene sequence, and gene content [21]. The mt genome length of the smallest known terrestrial plant is about 66 Kb, and the largest terrestrial plant mt genome length is 11.3 Mb [22, 23]. As a result, the amount of genes in terrestrial plants varies widely, typically between 32 and 67 [24]. In this study, we sequenced and annotated the mt genome of S. glauca and compared it with the genomes of other angiosperms (as well as gymnosperms), which provides additional information for a better understanding of the genetics of the halophyte S. glauca.


Genomic features of the S. glauca mt genome

The S. glauca mt genome is circular with a length of 474,330 bp. The base composition of the genome is A (28.00%), T (27.93%), C (21.62%), G (22.45%). There are 61 genes annotated in the mt genome, including 27 protein-coding genes, 29 tRNA genes, and 5 rRNA genes. The functional categorization and physical locations of the annotated genes were presented in Fig. 1. According to our findings, the mt genome of S. glauca encodes 26 different protein (nad7 has two copies) that could be divided into 9 classes (Table 1): NADH dehydrogenase (7 genes), ATP Synthase (5 genes), Cytochrome C Biogenesis (4 genes), Cytochrome C oxidase (3 genes), Ribosomal proteins (SSU) (3 genes), Ribosomal proteins (LSU) (1 gene), Transport membrane protein (1 gene), Maturases (1 gene), and Ubiquinol Cytochrome c Reductase (1 gene). The homologs of S. glauca mt genes in the mt genomes of H. sapiens, S. cerevisiae, and A. thaliana were identified and listed in Table S1. All of the protein-coding genes used ATG as starting codon, and all three stop codons TAA, TGA, and TAG were found with the following utilization rate: TAA 44.4%, TGA 37.04%, and TAG 18.52% (Table S2). It is reported that the mt genomes of land plants contain variable number of introns [25]. In the mt genome of S. glauca, there are 8 intron-containing genes (nad2, nad5, nad7 with two copyies, cox2, ccmFc, trnA-UGC, and trnV-AAC) harboring 15 introns in total with a total length of 16,743 bp. The intron lengths varied from 105 bp (trnV-AAC) to 2103 bp (nad2). The gene nad7 has two copies in the mt genome, and each copy contains 4 introns, which is the highest intron number. The trnV-AAC, instead, contains only one intron with a length of 105 bp, which is the smallest intron.

Fig. 1
figure 1

The circular map of S. glauca mt genome. Gene map showing 61 annotated genes of different functional groups

Table 1 Gene profile and organization of S. glauca mt genome

It has been reported that most land plants contain 3 rRNA genes [9, 11]. Consistently, three rRNA genes rrn5 (119 bp), rrnS (1303 bp), and rrnL (1369 bp) were annotated in S. glauca mt genome. Besides, 20 different transfer RNAs were identified in S. glauca mt genome transporting 18 amino acids, since more than one transfer RNAs might transport the same amino acid for different codons. For example, trnS-UGA and trnS-GCU transport Ser for synonymous codons UCA and AGC, respectively. Moreover, we observed that transfer RNA trnF-GAA, trnM-CAU, and trnN-GUU have two different structures with the same anticodon. Taking trnM-CAU as an example, both A and B structures share the same anticodon CAU transporting amino acid Met (Figure S1).

Repeat sequences anaysis

Microsatellites, or simple sequence repetitions (SSRs), are DNA fragments consisting of short units of sequence repetition of 1–6 base pairs in length [26]. The uniqueness and the value of microsatellites are due to their polymorphism, codominant inheritance, relative abundance, extensive genome coverage, and simplicity in PCR detection [27]. SSRs in the mt genome of S. glauca were identified with Tandem Repeats Finder software [28]. As a result, 361 SSRs were found in the mt genome of S. glauca, and the proportion of different forms were shown in Figure S2. SSRs in monomer and dimer forms accounted for 78.67% of the total SSRs present. Adenine (A) monomer repeats represented 46.28% (56) of 121 monomer SSRs, and AT repeat was the most frequent type among the dimeric SSRs, accounting for 58.15%. There are only two hexameric SSRs presented in S. glauca mt genome, located between nad4L and cox2, and between trnQ-UUG and trnM-CAU. The specific locations of pentamer and hexamer are shown in Table 2. Tandem repeats, also named satellite DNA, refer to the core repeating units of about 1 to 200 bases, repeated several times in tandem. They are widely found in eukaryotic genomes and in some prokaryotes [29]. As shown in Table 3, a total of 12 tandem repeats with a matching degree greater than 95% and a length ranging from 13 bp to 38 bp were present in the mt genome of S. glauca. The non-tandem repeats in S. glauca mt genome were also detected using REPuter software [30]. As a result, 928 repeats with the length equal to or longer than 20 were observed, of which 483 were direct, and 445 were inverted. The longest direct repeat was 30,706 bp, while the longest inverted repeat was 12,556 bp (Supplementary data sheet 1). The length distribution of the direct and inverted repeats are shown in Fig. 2. It is shown that the 20–29 bp repeats are most abundant for both repeat types.

Table 2 Distribution of penta and hexa SSRs in S. glauca mt genome
Table 3 Distribution of perfect tandem repeats in S. glauca mt genome
Fig. 2
figure 2

The repeats in S. glauca mt genome. a The synteny between the mt genome and its forward copy showing the direct repeats. b The synteny between the mt genome and its reverse complementary copy showing the inverted repeats. c The length distribution of reverse and inverted repeats in S. glauca mt genome. The number on the histograms represents the repeat number of designated lengths shown on the horizontal axis

The prediction of RNA editing

RNA editing refers to the addition, loss, or conversion of the base in the coding region of the transcribed RNA [31], found in all eukaryotes, including plants [32]. In chloroplast and mitochondrion, the conversion of specific cytosine into uridine alters the genomic information [33]. This process improves protein preservation in plants by modifying codons. Without the support of the proteomics data, it is impossible to detect accurate RNA editing. However, Mower’s software PREP could be used to computationally predict the RNA edit site [34]. In this analysis, 216 RNA editing sites within 26 protein-coding genes (Table 4) were predicted in the mt genome of S. glauca, using PREP-MT program (Fig. 3). Among those protein-coding genes, cox1 does not have any editing site predicted, while ccmB has the most editing sites predicted (29). Of those editing sites, 35.19% (76) were located at the first position of the triplet codes, 63.89% (138) occurred with the second base of the triplet codes. And there was a particular editing case in which the first and second positions of the triplet codes were edited, resulting in an amino acid change from the original proline (CCC) to phenylalanine (TTC). After the RNA editing, the hydrophobicity of 42.13% of amino acids did not change. However, 45.83% of the amino acids were were predicted to change from hydrophilic to hydrophobic, while 11.11% were predicted to change from hydrophobic to hydrophilic. The RNA editing might lead to the premature termination of protein-coding genes, and this phenomenon is likely to occur with atp4 and atp9 in S. glauca mt genome. Our results also showed that the amino acids of predicted editing codons showed a leucine tendency after RNA editing, which is supported by the fact that the amino acids of 47.69% (103 sites) of the edits were converted to leucine (Table 4).

Table 4 Prediction of RNA editing sites
Fig. 3
figure 3

The distribution of RNA-editing sites in S. glauca mt protein-coding genes. The gray bars represent the number of RNA-editing sites of each gene

DNA migration from chloroplast to mitochondria

Thirty-two fragments with a total length of 26.87 kb were observed to be migrated from cp genome to mt genome in S. glauca, accounting for 5.18% of the mt genome. There are 8 annotated genes located on those fragments, all of which are tRNA genes, namely trnA-UGC, trnF-GAA, trnH-GUG, trnI-GAU, trnR-ACG, trnM-CAU, trnN-GUU, and trnV-GAC. Our data also demonstrate that some chloroplast protein-coding genes, i.e. atpA, rrn16, rrn23, rpoC2, ndhA, psaB, and psbB migrated from cp to mitochondrion, even though most of them lost their integrities during evolution, and only partial sequences of those genes could be found in the mt genome nowadays (Table 5). The different destinations of transferred protein-coding genes and tRNA genes suggested that tRNA genes are much more conserved in the mt genome than the protein-coding genes, indicating their indispensable roles in mitochondria.

Table 5 Fragments transferred from chloroplast to mitochondria in S. glauca

Phylogenetic analysis within higher plant mt genomes

To understand the evolutionary status of S. glauca mt genome, the phylogenetic analyses was performed on S. glauca together with other 28 species, including 22 eudicots, 4 monocots, and 2 gymnosperms (designated as outgroups). Abbreviations and the accession number of mt genomes investigated in this study are listed in Table S3. A phylogenetic tree was obtained based on an aligned data matrix of 23 conserved protein-coding genes from these species, as shown in Fig. 4. The phylogenetic tree strongly supports the separation of eudicots from monocots and the separation of angiosperms from gymnosperms. Moreover, the taxa from 13 families (Leguminosae, Cucurbitaceae, Apiaceae, Apocynaceae, Solanaceae, Rosaceae, Caricaceae, Brassicaceae, Salicaceae, Chenopodiaceae, Gramineae, Cycadaceae, and Ginkgoaceae) were well clustered. The order of taxa in the phylogenetic tree was consistent with the evolutionary relationships of those species, indicating the consistency of traditional taxonomy with the molecular classification. Based on the phylogenetic relationships among the 29 species, different groups of plants were selected for further comparative analysis.

Fig. 4
figure 4

The phylogenetic relationships of S. glauca with other 28 plant species. The Neighbor-Joining tree was constructed based on the sequences of 23 conserved protein-coding genes. Colors indicate the families that the specific species belongs

The comparison of mt genome size and GC content between S. glauca and other species

The size and GC content are the primary characteristics of an organelle genome. We compared the size and GC content of S. glauca with other 35 green plants, including 4 phycophyta, 3 bryophytes, 2 gymnosperms, 4 monocots, and 22 dicots. The abbreviations of species names of those plants and the accession numbers of their mt genomes are listed in Table S3. As shown in Fig. 5, the sizes of mt genomes varied from 15,758 bp (C. reinhardtii) to 1,555,935 bp (C. sativus). The sizes of mt genomes of phycophyta and bryophytes were generally smaller compared to land plants, while that of S. glauca (474,330 bp) has an average size. Similarly, the GC contents of the mt genomes were also variable, ranging from 32.24% in S. palustre to 50.36% in G. biloba. In general, the GC contents of angiosperms, including monocots and dicots, are larger than those of bryophytes but smaller than those of gymnosperms, suggesting that the GC contents frequently changed after the divergence of angiosperms from bryophytes and gymnosperms. Interestingly, our results also showed that the GC contents fluctuate widely in phycophyta. In contrast, the GC contents in angiosperms were much conserved during the evolution, although their genome sizes varied tremendously.

Fig. 5
figure 5

The sizes and GC contents of 36 mt plant genomes. The blue dots represent the GC content of the taxa, and the blue trendline shows the variation of GC content across the different taxa. The green dots represent the genome size, and the trendline shows the variation of GC content

Comparison of genome organization with ten green plant mt genomes

The S. glauca mt genome organization was extensively investigated for protein-coding genes, cis-spliced introns, rRNAs tRNAs, and non-coding regions. It was further compared with 10 other taxa, including 3 plants from Chenopodiaceae. As shown in Table 6, protein-coding genes and cis-introns regions represent 5.00% and 3.92% of the whole S. glauca mt genome sequence, respectively. In comparison, the proportions of rRNA and tRNA regions represent only 1.17% and 0.47%, respectively. The other three plants from Chenopodiaceae have similar proportions of protein-coding genes, slightly higher than that of S. glauca. However, the proportions of coding regions were significantly different across families, probably due to the different mt genome sizes.

Table 6 Organization of mt genomes of S. glauca and other ten green plants

Gene duplication and lost in mt genomes of Chenopodiaceae plants

With the rapid development of sequencing technology, an increasing number of complete plant mt genomes were assembled and reported recently, facilitating the comparison analysis of the mt genome features among multiple plant species [35]. As described by Richardson et al., the mt genomes in plants vary considerably in size, gene content, and gene order [21]. The Chenopodiaceae plants have a relatively strong tolerance to biotic stress, especially to salt. Four mt genomes from this family: C. quinoa willd, S. oleracea, B. vulgaris, and S. glauca are already available. To understand whether those four plants have the same gene contents, the protein-coding genes from those 4 mt genomes were compared. As shown in Table S4, the specific gene duplication and gene loss were observed in different species. For example, nad7 was duplicated in S. glauca mt genome, and nad1 and rps7 were duplicated in B. vulgaris mt genome. The C. quinoa has the most intact mt genome, with only one gene (sdh4) loss, while atp4 and ccmC from B. vulgaris ssp, and nad1 and shh4 from S. oleracea were also lost. However, with five genes, nad4, nad6, rps4, rps13, and tatC, gene loss appears more frequent in the mt genome of S. glauca.

The substitution rates of protein-coding genes

The calculation of non-synonymous substitutions (Ka) and synonymous substitutions (Ks) is of great significance for the reconstruction of phylogeny and the understanding of evolutionary dynamics of protein-coding sequences in closely related species [36]. In genetics, Ka/Ks value could be used to determine whether selective pressure existed on a specific protein-coding gene during evolution: Ka/Ks > 1, positive selection; Ka/Ks = 1, neutral selection; and Ka/Ks < 1, negative selection [37]. The 18 protein-coding genes from S. glauca mt genome were compared with the mt genomes of 10 species, A. thaliana (NC_037304), B. vulgaris (NC_015099), C. papaya (NC_012116), G. max (NC_020455), S. suchowensis (NC_029317), Z. mays (NC_008332), C. quinoa Willd (NC_041093), S. oleracea (NC_035618), N. icotiana tabacum (NC_006581), and G. biloba (NC_027976) for Ka/Ks calculation. As shown in Fig. 6, the Ka/Ks values of S. glauca ccmB compared to G. max, S. suchowensis, A. thaliana, N. tabacum, and C. papaya were higher than 1, suggesting a positive selection occurred during evolution. However, the Ka/Ks values of most proteins in S. glauca were less than 1 compared to the other plant species, indicating the negative selections of those genes during evolution. Taken together, we conclude that the mt genes are highly conserved during the evolutionary process in green plants.

Fig. 6
figure 6

The Ka/Ks values of 18 protein-coding genes of S. glauca versus ten species


Mitochondria are the powerhouse of the plants that produce the required energy to carry out life processes. Plant mitochondria possess more complex genomes than animals, with extensive size variations, sequence arrangements, repeat content, and a highly conserved coding sequence [38]. Understanding the mt genome structure is required to unravel its function, replication, inheritance, and evolutionary trajectories [38]. In the current study, we studied the characteristics of the mt genome of S. glauca, a crucial salt tolerance plant with great value as a food source and phytoremediation agent. According to the reported data, most of the mt genome is circular, and few mt genomes are linear such as the mt genome of Polytomella parva [39, 40]. The mt genome of S. glauca reported in this study is circular with 474,330 bp in size.

The repeat sequences widely exist in the mt genome, and these repeats include tandem, short, and large repeats [41, 42]. Previous studies have shown that repeats in mitochondria are vital for intermolecular recombination. For this reason, the repeat sequences play a pivotal role in shaping the mt genome [43]. In this study, the SSRs, longer tandem repeats, and non-tandem repeats were intensively investigated (Fig. 2). The mt genome of S. glauca harbors abundant repeat sequences that might indicate that the intermolecular recombination frequently happens in the mt genome, which dynamically changes the sequence and conformation during the evolution. We also investigated the genome structure and organization of S. glauca in comparison with other land plants. Conclusively, the mt genome characteristics of S. glauca were consistent with those of other terrestrial green plants.

RNA-editing is a posttranscriptional process that occurs in the cp and mt genomes of higher plants, contributing to the better folding of proteins [44]. Investigating the RNA-editing sites helps to understand the gene expression of the cp and mt genes in plants. Previous studies reported approximately 441 RNA-editing sites within 36 genes in Arabidopsis and 491 RNA-editing sites within 34 genes in rice [39, 45]. In this study, 216 RNA-editing sites within 26 genes were identified. The identification of RNA editing sites provides essential clues for predicting gene functions with novel codons. As the cytoplasmic genome, migration of cp DNA to the mt genome occurred during the plant evolution. We found that 32 fragments were transferred from the cp genome to mt with 8 integrated genes, which are all tRNA genes (Table 5). Transfer of tRNA genes from cp to mt is common in angiosperms [44].

Further, we have analyzed the phylogenetic relationship of S. glauca with representative taxa based on the mt genome information. The resulted phylogenetic tree reflected a clear taxonomic relationship among the taxa. We also analyzed GC content of the mt genome in S. glauca along with other green plants. The result supports the conclusion that GC content is highly conserved in higher plants. The Ka/Ks analysis and the comparison of genome features with other plant’s mt genomes provide a comprehensive understanding of plant mt evolution. Generally, most of the results in this study were consistent with previous reports. The genes that undergone neutral and negative selections were also identified in S. glauca. However, most of the protein-coding genes in S. glauca mt had negative selection compared with other selected species, which is consistent with the previous studies, indicating that the protein-coding genes in the mt genome are conserved across the land plants. The ccmB gene is the only gene that underwent positive selection during the evolution.

In crop plants, deciphering and understanding the mt genome is essential for plant breeding. Understanding of mt genome will set a foundation for the evolutionary analysis, cytoplasmic male sterility, and molecular biological information for plant breeding. Even though S. glauca is not a crop plant, its biological significance and edible values are being examined. As a halophytic model plant with prominent salt-tolerance, whose mt genome has not been reported, the accomplishment of the mt genome provides an opportunity to conduct further genomic studies in S. glauca. Therefore, our study provides essential background information for future understanding of this plant [44].


In this study, we assembled and annotated the mt genome of S. glauca and performed extensive analyses based on the DNA sequences and amino acid sequences of the annotated genes. The S. glauca mt genome is circular, with a length of 474,330 bp. 61 genes, including 27 protein-coding genes, 29 tRNA genes, and 5 rRNA genes, were annotated in the genome. The repeats sequences and RNA editing in S. glauca mt genome were analyzed subsequently. The gene conversation between mt and cp genome was also observed in S. glauca by detecting gene migration. Moreover, our result also indicates consistency in molecular and taxonomic classification, besides GC contents in angiosperms, were also found conserved despite their genome sizes that varied tremendously. The Ka/Ks analysis based on code substitution revealed that most of the coding genes had undergone negative selections, indicating the conservation of mt genes during the evolution. This study provides extensive information about the mt genome for S. glauca, facilitating deciphering the salt resistance mechanism in plants.


Plant growth conditions, DNA extraction, and sequencing

The S. glauca seeds were provided by Chunyin Zhang (Yancheng Lvyuan Salt Soil Agricultural Technology Co. Ltd., Yancheng, Jiangsu, Southeast China, Seeds were treated with 0.03% Gibberellin for 24 h and germinated at 25 °C in a growth chamber. The seedlings were planted at 25 °C in the greenhouse with 16/8 h of light-dark photoperiod cycle. Leaves from about 40 days old plants were used for DNA isolation using CTAB method [46]. The DNA sample quality was examined with agarose-gel electrophoresis, and the concentration was measured using Nanodrop instrument (2000c UV-Vis). The qualified samples were sent to the Annoroad Gene Technology ( for Pacbio sequencing.

Assembly and annotation of the mitochondrial genome

The mitochondrial sequences of S. glauca were selected with blast software using the conserved mitochondrial sequences of Beta vulgaris, Spinacia oleracea, and Chenopodium quinoa Willd as queries. The mt genome was assembled using Canu v1.8 with the selected reads [47]. The assembled contigs were polished (Pilon v 1.18) with Illumina reads to correct read errors. The GE-Seq tool on MPI-MP CHLOROBX website [48] ( was used for the mt genome annotation using the mt genomes of the following species as references: Arabidopsis thaliana (NC_037304), Beta vulgaris (NC_002511), Brassica napus (NC_008285), Carica papaya (NC_012116), Chenopodium quinoa Willd (NC_041093), Daucus carota (NC_017855), Glycine max (NC_020455), Nicotiana tabacum (NC_006581), Spinacia oleracea. (NC_035618), and Salix suchowensis (NC_029317) as references. The threshold for protein search identity was 55%, and that of rRNA, tRNA, and DNA search identity was 85%. The annotation results from Ge-Seq were manually adjusted with Mega 7.0 [49]. The output genebank format file was manually confirmed, and the mitochondrial circular map was drawn using Organellar Genome DRAW (OGDRAW) [50].

Analysis of repeated sequences

Microsatellite identification tool was used to detect simple sequence repeats [51] ( The repeats of 1, 2, 3, 4, 5, and 6 bases with 8, 4, 4, 3, 3, and 3 repeats numbers, respectively, were identified in this analysis. The tandem repeats with > 6p repeat unit were detected using Tandem Repeats Finder v4.09 software [28] ( with default parameters. The direct and inverted repeats were detected using REPuter software [30] ( with the minimal repeat size set to 20 bp.

Chloroplast to mitochondrion DNA transformation and RNA editing analyses

DNA migration is common in plants and varies from species to species [52]. This phenomenon occurs during autophagy, gametogenesis, and fertilization [53]. The cp genome of S. glauca (NC_045302.1) was downloaded from NCBI Organelle Genome Resources Database. Blastn software on NCBI was used to identify the protein-coding and tRNA genes transferred from chloroplasts to mitochondria. Screening criteria were set as the matching rate ≥ 70%, E-value ≤ 1e − 10, and length ≥ 40. The editing sites in the mitochondrial RNA of S. glauca were revealed using the mt gene encoding proteins of plants as references. The analysis was conducted on the Plant Predictive RNA Editor (PREP) suite [34] ( with a cut off value of 0.2.

Phylogenetic tree construction and Ka/Ks analysis

The conserved protein-coding genes from mt genomes of S. glauca and other 28 taxa were used for phylogenetic tree construction. The mt genomes were downloaded from NCBI, and the conserved protein-coding genes (atp1, atp4, atp6, atp8, atp9, ccmB, ccmC, ccmFc, ccmFn, cob, cox1, cox2, cox3, matR, nad1, nad2, nad3, nad4L, nad5, nad6, nad7, and nad9) were extracted using TBtool software [54], and then aligned using Muscle software [55]. Subsequently, a Neighbor-joining (NJ) tree was constructed by Mega 7.0 software using the Poisson model with a bootstrap of 1000 [49]. C. taitungensis and G. biloba were designated as the outgroup in this analysis. The synonymous (Ks) and non-synonymous (Ka) substitution rates of the protein-coding genes in S. glauca mt genome were analyzed using ten representative species (Table S3) as references. In this analysis, Mega 7.0 [49] was used for sequence alignment, and DNAsP v.6.12 [56] was used to calculate Ka/Ks.