Introduction

Mitochondria not only convert biomass energy to chemical energy but also participate in metabolic processes associated with the degradation and synthesis of intracellular compounds [1]. According to the endosymbiotic theory, mitochondria originate from ancient endosymbiotic bacteria and possess semi-autonomous mitogenomes capable of encoding certain self-related proteins [2, 3]. During long-term evolution, mitogenomes have established stable regulatory mechanisms with the nuclear genomes [4]. Notably, angiosperm mitogenomes are predominantly maternally inherited, which eliminates the influence of the paternal line and facilitates studies on genetic mechanisms [5].

Compared to other eukaryotes, plant mitogenomes are more complex in terms of structure, number of repeat sequences, and size [6]. The sizes and structures vary largely among different plant species, which can be attributed to the genomic rearrangements involved in evolutionary processes [7]. Moreover, the sizes of plant mitogenomes are larger than those of other eukaryotes [8], and fluctuate greatly even within the same genus [9]. The mitogenomes of terrestrial plants vary significantly in size [10, 11], ranging from 66 kb to 11.3 Mb. A tremendous number of repetitive sequences, including simple sequence repeats (SSRs), tandem repeats, and dispersed repeats, have been discovered within plant mitogenomes and are believed to contribute to their structural variations [12]. Although circular molecules are commonly considered as the predominant form of plant mitochondrial genomes, other distinct structures are also observed, such as linear configurations, branched structures, and numerous smaller circular molecules [13]. For example, the complete mitogenome of Coptis chinensis can be mapped as six unequal circular molecules in length [14]. Notably, the plant mitochondrial genome not only varies in structure and size, but also is diverse and complex in gene content [15]. The number of genes typically ranges from 32 to 67 in most land plants [16]. Gene transfer events further complicate plant mitogenomes, as the transferred genes are primarily associated with the encoding of ribosomal proteins [17]. Generally, the transfer frequency of genes from chloroplast genomes to mitogenomes is higher in plants than in other eukaryotes [18]. In summary, these features, particularly the extensive genomic recombination, make the assembly and annotation of plant mitogenomes more challenging than those of other organelle genomes [19]. Currently, only about 500 complete plant mitogenomes have been uploaded to the National Center for Biotechnology Information (NCBI), which significantly lags behind the number of chloroplast genomes studied. The large repetitive regions in the mitochondrial genome cannot be accurately identified using next-generation sequencing, and effective resolution of this issue requires integration with third-generation sequencing [20]. Research on plant mitochondrial genomes has been continuously advancing due to significant breakthroughs in sequencing and assembly technologies.

Fritillaria ussuriensis Maxim., commonly known as “Ping Bei” in Chinese, is a perennial herbaceous plant belonging to Fritillaria of Liliaceae. It is mainly distributed in the lowland regions of northeast China, such as the Changbai Mountain range [21]. The dried bulbs of F. ussuriensis have long been recognized for their therapeutic properties in clearing heat, resolving toxins, and relieving cough and phlegm [22]. For thousands of years, it has been one of the most vital antitussive and expectorant drugs in China and other Asian countries [23]. In recent decades, various ingredients have been found in F. ussuriensis [24, 25], while the main bioactive components are steroidal alkaloids [26]. As a well-known medicinal material in the northeast provinces of China [27], F. ussuriensis was officially recorded in the 2005 edition of Chinese Pharmacopoeia [28]. There has been research on the chloroplast genome of F. ussuriensis [29], but none on its mitochondrial genome. In response to these circumstances, the data of F. ussuriensis mitogenome was obtained by integrating PacBio and Illumina sequencing technology. Based on these data, the mitogenome of F. ussuriensis was assembled and annotated. Then we conducted various analyses, including the assessment of codon usage, identification of the gene transfer from the chloroplast genome to the mitogenome, analysis of repeated sequences, identification of RNA editing sites, analysis of selective pressure, comparative genomics with closely related species and investigation of phylogenetic relationships. These results will help to better understand the structure and function of the F. ussuriensis mitogenome and provide valuable data for conservation biology, population genetics, and evolutionary studies of this species.

Results

Features of the F. ussuriensis mitogenome

Bandage was used to visualize the structure of contigs generated by the assembly result from GetOrganelle (Fig. 1A). The assembly had multiple nodes representing the contigs, with overlapping regions indicated by the connecting lines. Moreover, the presence of repetitive regions was identified through the deep coverage analysis. Subsequently, we mapped the PacBio long reads to the repetitive regions of the genome, which simplified the assembly into 13 circular contigs (Fig. 1B). These simplified contigs were also referred to as chromosomes in the previous study [30]. The Illumina data were mapped to the assembled result, covering each base of the 13 chromosomes with an average depth of approximately 65-fold (Fig. S1). These results provide strong evidence for the accuracy of the mitochondrial genome assembly.

Fig. 1
figure 1

Schematic diagram of hybrid assembly. A Graphical mitogenome assembly of F. ussuriensis. B 13 circular chromosomes obtained by solving repetitive regions based on PacBio long-reads

The assembled F. ussuriensis mitogenome had 13 chromosomes (Fig. 2), totaling 737,569 bp in length, with an average GC content of 45.41%. The detailed information of the length, GC content, and accession number for each chromosome was presented in Table 1. The size range of these chromosomes was from 29,149 bp to 154,202 bp. In the mitochondrial genome of F. ussuriensis, a total of 55 genes were annotated, comprising 41 protein-coding genes (PCGs), 12 transfer RNA (tRNA) genes, and 2 ribosomal RNA (rRNA) genes (Table S1). These PCGs included 9 NADH dehydrogenase genes, 5 ATP synthase genes (two copies of atp9), 4 cytochrome c biogenesis genes, 3 cytochrome c oxidase genes, 1 ubiquinol cytochrome c reductase gene, 1 transport membrane protein gene, 1 maturase gene, 11 small subunits of ribosome genes, 4 large ribosomal subunit genes, and 1 succinate dehydrogenase gene. Among these PCGs, genes ccmFC, nad1, rps10, rps3, and rpl2 each had 1 intron, while cox2, nad4, nad5, and nad7 each had 2 introns. In addition, we annotated 12 tRNA genes, including trnC-GCA, trnD-GUC, trnQ-UUG, trnE-UUC, trnM-CAU, trnH-GUG, trnW-CCA, trnK-UUU, trnL-GAG, trnN-GUU, trnY-GUA, and trnfM-CAU. Meanwhile, 2 rRNA genes (rrn5 and rrnS) were also identified in this mitogenome.

Fig. 2
figure 2

The mitogenome maps of F. ussuriensis. The mitogenome consists of 13 circular chromosomes with different lengths and gene contents

Table 1 The information of the F. ussuriensis mitogenome

Codon usage analysis

The codon usage in the mitochondrial PCGs of F. ussuriensis was analyzed using CodonW. The GC content varied among different codon positions, with an average of 43.42%. Specifically, the GC contents at positions GC1, GC2, and GC3 were 48.23%, 43.31%, and 38.72%, respectively. This uneven distribution of bases suggests the bias of codon composition. The effective number of codon (ENC) for these mitochondrial genes ranged from 39.65 to 60.00, with an average of 53.72 (Table S2). Among them, only 4 genes had an ENC value below 45, indicating that these codons in the mitochondrial genome had a relatively weak preference in usage. As depicted in Table S3 and Fig. 3, Alanine preferred GCU codon, displaying a relative synonymous codon usage (RSCU) value of 1.60, which was highest in these PCGs. A total of 30 codons were observed with RSCU values exceeding 1.00, indicating the relatively high frequency of usage of these codons. Among these codons, 27 codons had A or U at the third position, with 11 codons ending with A and 16 codons ending with U, accounting for 90.00% of the total. This finding suggested a preference for A-ended or U-ended codons in the F. ussuriensis mitochondrial genome.

Fig. 3
figure 3

Codon preference of F. ussuriensis mitogenome

Repeat sequence analysis

In the F. ussuriensis mitogenome, we identified a total of 192 SSRs (Fig. 4), and 103 of them consisted of tetrameric repeats, accounting for 53.65% of the count. Monomeric and dimeric repeats had similar numbers, with 37 and 36, respectively, while there were only 13 trimeric repeats. The content of pentameric repeats was relatively low, with only 3 repeats (1.56%), and no hexameric repeats existed. In the 37 monomeric repeat sequences, 36 of them were repeat type A or T, accounting for 97.30% of the total.

Fig. 4
figure 4

The SSRs identified in the F. ussuriensis mitogenome

As shown in Fig. 5A, a total of 4,270 dispersed repeat sequences were identified in the F. ussuriensis mitogenome, including 2,160 forward repeats, 32 reverse repeats, 2,061 palindromic repeats, and 17 complement repeats. These repeat sequences varied in size ranging from 30 to 418 bp. Of them, 22 and 8 repeat sequences exceeded 100 and 200 bp in length, respectively. Totally 3,815 (70.66%) dispersed repeat sequences fell within the length of 30–34 bp (Fig. 5B). These dispersed repeat sequences had 139,575 bp in length, which accounted for 18.92% of the complete mitogenome sequence.

Fig. 5
figure 5

The dispersed repeats identified in F. ussuriensis mitogenome. A Type of the dispersed repeats. B Distribution of the lengths of dispersed repeats

Prediction of RNA editing sites

In the mitogenome of F. ussuriensis, 505 RNA editing sites were identified, all of which were of the C to U type. The gene types were closely correlated with the frequency of RNA editing, with NADH dehydrogenase genes being relatively more prone to editing. Specifically, the gene nad4 exhibited the highest number of editing sites (46 sites), while the ribosomal protein encoding gene rps11 had the fewest (1 site), as shown in Fig. 6. Of these editing sites, 157 (31.09%) existed at the first base of the codon, while 332 (65.74%) appeared at the second base position. In addition, RNA editing not only changed the encoded amino acid but also led to the generation of stop codons, which prematurely terminated the coding process. This phenomenon was also observed in the coding genes rps10, ccmFc, and atp9 of the F. ussuriensis mitogenome. Similarly, RNA editing also resulted in the generation of start codons for the genes nad4L, nad1, cox2, and atp4 by converting ACG to AUG. The predicted results also showed that codon editing led to the highest tendency for conversion to leucine, accounting for 41.98% (212 sites) of the total conversion, and following that, 24.36% of the amino acids altered to phenylalanine (123 sites). Even with RNA editing, 44.56% of the amino acids retained their original hydrophilicity or hydrophobicity, while 45.54% changed from hydrophilicity to hydrophobicity, and 9.31% changed from hydrophobicity to hydrophilicity (Table S4).

Fig. 6
figure 6

The distribution of RNA editing sites among PCGs in the F. ussuriensis mitogenome

DNA migration from chloroplast to mitochondria

We identified 20 homologous fragments between the chloroplast and mitochondrial genomes of F. ussuriensis (Fig. 7). These fragments had a total length of 8,954 bp, with individual size ranging from 65 bp to 1,917 bp, collectively accounting for 1.21% of the mitogenome (Table S5). In the evolutionary process, 5 genes (ndhA, ndhC, psbD, rrn16, and rrn23) lost their integrity through migration from the chloroplast genome to the mitogenome. Furthermore, 8 intact genes were also found in these homologous fragments (rrn5, rps7, rps12, trnC-GCA, trnQ-UUG, trnH-GUG, trnfM-CAU, and trnW-CCA), most of which were tRNA genes.

Fig. 7
figure 7

The 13 blue arcs represent the mitochondrial genome of F. ussuriensis, and the purple arc represents its chloroplast genome. The genome fragments corresponding to the green connecting lines between arcs are homologous fragments

Phylogenetic analysis

To better understand the evolutionary status of F. ussuriensis in the monocots system, we performed a phylogenetic analysis along with the mitochondrial genomes of 24 published plant species. Among the selected plants, there were 12 species of Poales, 5 species of Alismatales, 4 species of Asparagales, and 1 species of Liliales, with Liriodendron tulipifera and Magnolia officinalis serving as the outgroup (Table S6). These plant mitogenomes were obtained from NCBI. Then a phylogenetic tree was constructed using the maximum likelihood analysis with the GTR + G model and 1,000 bootstrap replicates. This phylogenetic tree displayed the high bootstrap support values of more than 80 for most nodes (Fig. 8). Our analysis revealed a close phylogenetic relationship between F. ussuriensis and Lilium tsingtauense. The overall topology of the phylogenetic tree was congruent with the classification provided by the Angiosperm Phylogeny Group IV (APG IV) system.

Fig. 8
figure 8

The phylogenetic relationships of F. ussuriensis with other 24 plant species. The tree was constructed based on the DNA sequences of 24 conserved mitochondrial PCGs (atp1, atp4, atp6, atp8, atp9, nad1, nad2, nad3, nad4, nad4L, nad5, nad6, nad7, nad9, ccmB, ccmC, ccmFC, ccmFN, cob, cox1, cox2, cox3, matR, and mttB)

Comparison of the F. ussuriensis mitogenome with other five closely related species

In order to delve deeper into the evolutionary characteristics of the mitochondrial genome of F. ussuriensis, we compared it with L. tsingtauense, and 4 species of Asparagales (Allium cepa, Chlorophytum comosum, Asparagus officinalis, and Crocus sativus). The GC content in the coding regions of these mitochondrial genomes ranged from 45.32% to 46.81%. Additionally, the GC1, GC2, and GC3 content of these mitochondrial genomes fell within the respective ranges of 48.08% to 49.76%, 43.31% to 43.98%, and 38.26% to 38.72%. The detailed values for each species were presented in Table 2. It was worth noting that while the GC content remained relatively consistent across these species, there was a significant variation in the number of genes. Among the species analyzed, a total of 25 PCGs were identified to be shared, including atp1, atp4, atp6, atp8, atp9, nad1, nad2, nad3, nad4, nad4L, nad5, nad6, nad7, nad9, ccmB, ccmC, ccmFC, ccmFN, cob, cox1, cox2, cox3, matR, mttB, and rps12. C. comosum had the highest number of mitochondrial genes with 62, while A. cepa had the lowest number of mitochondrial genes with 43. The selective pressure of the PCGs is determined by the the non-synonymous (Ka) and synonymous (Ks) substitution ratios (Ka/Ks): Ka/Ks > 1, positive or Darwinian selection; Ka/Ks = 1, neutral selection; and Ka/Ks < 1, negative or purifying selection [31]. These PCGs from the mitogenomes of the six closely-related species were compared to determine the Ka/Ks ratio (Table 3). Genes nad3 and ccmB exhibited a Ka/Ks value greater than 1, indicating that they underwent positive selection. The remaining PCGs all possessed the values of Ka/Ks under 1, accounting for 92.00% of the total, which suggested that these genes had undergone purifying selection to maintain the normal mitochondrial function.

Table 2 General characteristics of the mitochondrial genomes of F. ussuriensis, L. tsingtauense, and 4 species of Asparagales
Table 3 Ka/Ks values of 25 PCGs of F. ussuriensis versus those of L. tsingtauense and 4 species of Asparagales

Gene loss and multi-copy gene

The loss of PCGs is an important factor that leads to significant variation in the mitochondrial gene content of angiosperms [17]. Reportedly, 41 PCGs were inferred to exist in the ancestral mitogenome of angiosperms [32]. Gene sdh3 was lost in the mitogenomes of all selected species (Fig. 9). Furthermore, 3 genes (rpl2, rpl10, and sdh3) were lost in the mitochondrial genomes of A. cepa, C. comosum, A. officinalis, C. sativus, and L. tsingtauense. The presence of multi-copy genes within plant mitochondrial genomes was widely observed [33]. Specifically, the mitogenome of F. ussuriensis contained two copies of atp9. Similarly, A. cepa and C. sativus mitochondrial genomes possessed two copies of ccmFN and cox3, respectively. In contrast, 3 genes (nad6, nad9, and rps2) were found to possess two copies in the mitogenome of L. tsingtauense. Overall, F. ussuriensis had the relatively complete PCGs, while A. cepa lost all mitochondrial genes except gene rps12 and 24 core PCGs.

Fig. 9
figure 9

The PCGs distribution of F. ussuriensis, L. tsingtauense, and 4 species of Asparagales. Green boxes indicate that the gene is not present in the mitogenomes. Yellow and blue boxes indicate that one and two copies exist in the particular mitochondrial genomes, respectively

Nucleotide diversity

Nucleotide diversity (Pi) serves as a measure for assessing the variation in nucleic acid sequences across different species, and regions with significant variability can be identified as potential molecular markers for population genetics research [34]. Figure 10 illustrated the nucleotide diversity of the 25 PCGs and 2 rRNA genes in the mitochondrial genomes of F. ussuriensis, L. tsingtauense, and 4 species of Asparagales. The Pi values for these genes ranged from 0.0175 to 0.1039, with most being below 0.1. Notably, the gene atp9 exhibited the highest variability, with a Pi value of 0.1039. Additionally, the genes cox3 (Pi = 0.0745), cox2 (Pi = 0.0721), and rps12 (Pi = 0.0715) demonstrated considerable variability. In contrast, genes nad1 (Pi = 0.0161) and nad7 (Pi = 0.0187) were the most conserved PCGs. Additionally, the two rRNA genes (rrn5 and rrnS) were also conserved, with Pi values of 0.0162 and 0.0114, respectively. Overall, the nucleotide diversity of the PCGs varied significantly among these species.

Fig. 10
figure 10

Nucleotide diversity (Pi) among the mitochondrial genomes of F. ussuriensis, L. tsingtauense, and 4 species of Asparagales

Colinear analysis

To investigate the relationships among the mitochondrial genomes of F. ussuriensis, L. tsingtauense, and four other Asparagales species, we utilized the BLASTN program to analyze homologous genes and their sequence arrangements. The connection ribbon between the two mitogenomes represented a highly homologous collinear block (Fig. 11). However, these mitogenomes exhibited limited collinearity, with many regions lacking homology among the species. Additionally, the arrangement order of collinear blocks in these mitochondrial genomes showed inconsistencies. These findings suggested substantial genomic rearrangements among the mitochondria of F. ussuriensis and the other 5 species, indicating a lack of conservation in the genomic structure of these mitogenomes.

Fig. 11
figure 11

Collinear analysis of the mitogenomes of F. ussuriensis, L. tsingtauense, and 4 species of Asparagales. The pink arcs indicate homologous regions of these mitochondrial genomes

Discussion

Mitochondria are considered as the powerhouse that generates the energy required for the life processes of plants [35]. Research over the years has demonstrated the plant mitochondrial genome as a dynamic entity in evolution, displaying vast diversity among the different species [36]. Since the initial endosymbiotic event, the plant mitochondrial genomes have undergone significant changes [37], resulting in a series of important features that make them more complicated than those of animals [13]. Investigations have revealed that the reported sizes of most plant mitogenomes range between 200 and 800 kb [32]. The size and GC content are the primary factors for assessing species [38]. Despite significant variations in genome sizes, the GC contents are relatively consistent across species, supporting the notion that GC content has remained conserved throughout the evolution of higher plants [5, 39]. The number of genes within mitochondrial genomes varies among plant species, but the genes essential for the main respiratory functions and energy synthesis are highly conserved [40, 41]. This phenomenon may contribute to the maintenance of mitochondrial function and normal life activities.

Although plant mitochondrial genomes have been commonly reported as circular structures similar to those of animals [42, 43], many studies have revealed that their actual structures can also have multiple branched, linear, or mixed forms of genomic structures [44, 45]. Multi-circular structures were also identified in the mitochondrial genomes of various species, encompassing ferns, basal angiosperms, monocots, and eudicots. For instance, the mitochondrial genome of Gelsemium elegans comprised two circular chromosomes [46], and a similar phenomenon was also observed in the Psilotum nudum mitogenome [47]. In contrast, the mitochondrial genome of Amborella trichopoda was assembled into five circular chromosomes, with sizes ranging from 118.7 kb to 3,179.3 kb [48]. In the majority of eudicots, the chromosome count of mitochondrial genomes was generally less than five [49], while the mitogenome of Gastrodia elata, a monocot plant, comprised 19 chromosomes [50]. The assembled mitochondrial genome of F. ussuriensis consisted of 13 circular chromosomes, each of which contained genes. In plant mitochondrial genomes with multiple chromosomes, some chromosomes may lack functional genes. For instance, among the 58 chromosomes of the Silene noctiflora mitogenome, 20 were discovered to contain no functional genes [51]. These results indicate that plant mitochondrial genomes are diverse and complex in terms of structure, size, and gene content.

The variation in mitogenome sizes can be primarily attributed to the insertion or loss of genetic sequences and the increase in repetitive elements [49]. The repetitive sequences in the mitogenomes are closely related to intermolecular recombination [52], which can cause drastic changes in the size and structure of the genomes [53]. Previous studies indicated that SSRs containing A and T bases were more likely to occur in the genomes of chloroplasts and mitochondria because the two hydrogen bonds that connected them were easier to break [54, 55]. Specifically, repetitive elements ranging from 0.5 kb to 120 kb were identified as key contributors to the notable size variance observed in several mitochondrial genomes of Zea mays [56]. In the F. ussuriensis mitogenome, the repeat sequences were less than 500 bp. A similar observation was made in the mitochondrial genomes of four Populus species, where the lengths of repeat sequences did not exceed 350 bp [57]. This phenomenon may contribute to the stability of size and structure of the plant mitochondrial genomes.

The frequent transfer of foreign DNA among chloroplasts, mitochondria, and nuclei also affects the size of the plant mitochondrial genomes [58]. For instance, numerous DNA sequences in the mitochondrial genome of Malus domestica are highly similar to the nuclear DNA sequences [59]. These sequences are also considered to be the driving force of the mitogenome expansion. Similarly, in the mitogenome of Vitis vinifera, a total of 30 chloroplast fragments with a combined length of 68,237 bp were discovered, accounting for 8.80% of its mitochondrial genome [60]. In this study, the majority of the sequences transferred from the chloroplast genome to the mitogenome were tRNA genes, which were consistent with similar transfer processes observed in other angiosperms [4]. The transfer of sequences between these two genomes was generally considered unidirectional [61], and related studies showed that sequences transferred from the chloroplast could compensate for tRNA genes lost during mitochondrial evolution [62]. The expression of tRNA genes from the chloroplast genome in the mitochondrial genome was demonstrated [63], and these genes participated in the transport of relevant amino acids to maintain normal life activities [64]. Analysis of sequence transfer provides valuable insights into DNA transfer events between different genomes within cells [65], which contribute to a deeper understanding of the evolutionary process of eukaryotes.

RSCU reflects the ratio of the actual codon usage frequency to the theoretical frequency [66]. RSCU exceeding 1 indicates that the frequency of actual codon usage is higher than that of other synonymous codons [67]. Overall, there is a strong A/T bias at the third codon position in the mitochondrial genome of F. ussuriensis. This phenomenon was also found in other plant mitochondrial genomes [68, 69], which was considered to be the result of the long-term evolutionary process of plants adapting to the environment [70].

RNA editing is a post-transcriptional process occurring commonly in the plant mitochondrial genome, and can lead to a huge diversity of gene sequences [71]. This process can modify the relevant genetic information at the mRNA level, and thus contributes to improving protein folding [72]. Reportedly, 517 RNA editing sites among 34 genes and 441 RNA editing sites across 36 genes were identified in the mitochondrial genomes of Bupleurum chinense [73] and Arabidopsis thaliana [74], respectively. Moreover, nearly 50.00% of RNA editing sites were found at the second codon [75]. The hydrophobic characteristics of more than half of the amino acids were altered [76]. Furthermore, the hydrophobicity of amino acids was closely related to protein folding and secondary structure formation [77]. The editing of amino acids into stop codons can prematurely terminate protein synthesis, as also reported in other plants [78, 79]. Despite the limitations of the existing prediction methods and other factors leading to low average coverage [80], the identification of RNA editing sites still provided valuable information to predict gene functions.

In this study, we compared the mitogenome of F. ussuriensis with those of L. tsingtauense, and 4 species of Asparagales to learn more about its structure and organization. The comparisons revealed that these mitogenomes had undergone substantial rearrangements, exhibiting an exceptional lack of conservation in structure across these species, which might serve as a fundamental driving force behind the evolution and diversification of plant mitogenomes [81]. The analysis of nucleotide diversity revealed that the rRNA genes were relatively conserved, a finding consistent with similar reports in the previous study [82]. The selection pressure on PCGs was usually estimated by the ratio of Ka/Ks, which helped deepen the understanding of the evolution of plant mitogenomes [31]. The phenomenon of relatively conserved PCGs in the mitochondrial genomes is also common in related studies of other plants [73, 83]. In natural selection, the Ka/Ks value of most genes in plant mitochondrial genomes was smaller than 1 in order to remove the deleterious mutations and maintain mitochondrial functions [84]. The maximum likelihood analysis established a preliminary evolutionary position of F. ussuriensis at the mitochondrial genome level and indicated a close affinity between F. ussuriensis and L. tsingtauense. Our findings underscored the utility of organelle genomic data in clarifying plant phylogenetic relationships, advancing the development of molecular markers, and fostering studies on genetic evolution.

Conclusions

By a combined strategy of Illumina and PacBio sequencing technologies, high-quality sequencing data of the F. ussuriensis mitogenome were obtained. Subsequently, we assembled and annotated this genome with a total length of 737,569 bp, which consisted of 13 circular chromosomes. Within this genome, 55 genes were annotated, including 41 PCGs, 12 tRNA genes, and 2 rRNA genes. Codon usage analysis suggested a preference for A-ended or U-ended codons in this mitogenome. We identified 192 SSRs and 4,270 dispersed repeat sequences. The majority of the transferred sequences from the chloroplast genome to the mitogenome of F. ussuriensis were tRNA genes. 505 RNA editing sites existed in the PCGs of this mitogenome. Based on the phylogenetic analysis of the mitogenomes of F. ussuriensis and 24 other plants, the evolutionary status of this plant was clarified, which was basically consistent with the traditional classification results. Additionally, Ka/Ks analysis, nucleotide diversity analysis, and comparative analysis of genomic features were performed to provide a more comprehensive understanding of mitogenome evolution in these closely related species. In summary, our analyses provide important information for biological research on F. ussuriensis and support further determination of evolutionary relationships within Liliaceae.

Materials and methods

Plant materials, DNA extraction and sequencing

The etiolated seedlings of F. ussuriensis that were covered with soil were collected from the medicinal plant plantation in Tieli City, Heilongjiang Province, China (46°58′28″ N, 127°57′7″ E). The sample was deposited at the herbarium of Jiamusi University (Jiamusi, China), and its accession number was PBM202301. Density gradient centrifugation was utilized to isolate the mitochondria from the leaves of these etiolated seedlings [85], further using DNase I for eliminating the DNA pollutants of other genomes [86]. After the above procedures, DNA was extracted from the purified mitochondria. We used the Express Template Prep Kit 2.0 of PacBio to convert approximately 2 µg of mitochondrial DNA into SMRTbell libraries [15], and then used PacBio Sequel II platform for sequencing [87]. Covaris M220 system was utilized to sonicate roughly 1 µg of mitochondrial DNA into fragments of approximately 500 bp in length [88], and then we used the TIANgel Midi Purification Kit to purify the processed fragments [89]. NEBNext® Ultra™ DNA Library Prep Kit was used to construct the libraries, and then the Illumina NovaSeq 6000 platform was utilized for their sequencing [30].

Assembly and annotation of the mitogenome

The mitogenome of F. ussuriensis was assembled using a hybrid strategy that combined Illumina and PacBio reads. Initially, FastQC (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/) was used to check the raw short reads [79], and then they were trimmed to produce clean reads by Trimmomatic (v0.36) [90]. GetOrganelle (v1.6.4) was used to build contigs from the short clean reads with default parameters [91]. The long reads were used to de novo assemble the F. ussuriensis mitogenome by the Canu (v2.1.1) [92], and then we aligned them with the plant mitogenome database to extract potential mitochondrial contigs. We mapped the short reads to the extracted contigs, retaining all mapped reads by BWA (v0.7.17) and SAMTools (v0.1.19) [93, 94]. The hybrid assembly of PacBio long reads and Illumina short reads was performed through SPAdes (v3.9.0) with multiple k-mer parameters [95]. Subsequently, we mapped the long PacBio reads onto the repeated sequence areas of the assembled mitochondrial scaffold to resolve these regions. Contigs generated by SPAdes were imported into Bandage to visualize and analyze the assembly [96]. All 24 highly conserved core genes were included in our assembly results [30], and all mitochondrial fragments were fully extended. Moreover, the assembled mitogenomes were aligned with the raw reads, and polished through Pilon (v1.18) for correcting the errors [97]. Finally, we obtained 13 simplified circular contigs by this assembly strategy, representing the complete mitogenome of F. ussuriensis.

Based on the mitochondrial genomes of A. cepa (NC_030100.1), C. comosum (MW411187.1), A. officinalis (NC_053642.1), and L. tsingtauense (OP973783.1-OP973810.1) from GenBank, the mitogenome of F. ussuriensis was annotated by GeSeq [98]. Moreover, tRNAscan-SE and BLASTN were utilized to annotate tRNA and rRNA [99, 100], respectively. Then, we checked and adjusted the errors from the annotated results through Apollo (v1.11.8) [101]. Finally, the genome map was generated by OGDRAW program (https://chlorobox.mpimp-golm.mpg.de/OGDraw.html) [102].

Analysis of repeat sequences

SSRs were identified with MISA (https://webblast.ipk-gatersleben.de/misa/) [103], while the minimum numbers of mono-, di-, tri-, tetra-, penta-, and hexa-nucleotides were 10, 5, 4, 3, 3, and 3, respectively. The minimum threshold between SSRs was set to be 100 bp. Moreover, the forward, reverse, palindromic, and complementary repeat sequences were identified by REPuter (https://bibiserv.cebitec.uni-bielefeld.de/reputer/) with a minimum length of 30 bp and a hamming distance of 3 [104].

Analysis of codon composition

The codon composition and preference of the mitochondrial genome of F. ussuriensis were analyzed through CodonW (v1.4.4) [105]. It was noteworthy that the codon preference analysis included determining the GC content, codon numbers per gene, RSCU value, and ENC.

Prediction of RNA editing sites

The RNA-editing sites of the PCGs in F. ussuriensis mitogenome were predicted using Deepred-mt [80]. This tool has high accuracy through the use of a convolutional neural network model. The results with the probability above 0.9 were retained.

Selective pressure analysis

The mitochondrial PCGs of A. cepa, C. comosum, A. officinalis, C. sativus, and L. tsingtauense were used for analyzing the Ka/Ks values of the PCGs in F. ussuriensis mitogenome. The mitochondrial PCGs of the selected species and F. ussuriensis were aligned using the Mega (v7.0) [106], and the Ka/Ks values were determined by the KaKs_Calculator (v2.0) [107].

Homologous fragments between chloroplast genome and mitochondrial genome

Initially, the chloroplast genome sequences of F. ussuriensis were downloaded from NCBI. Homologous fragments between the chloroplast and mitochondrial genomes of F. ussuriensis were determined using BLASTN with the following parameters: -word_size 9, -gapopen 5, -gapextend 2, and -evalue 1e-5 [100]. Then the identified fragments were annotated using GeSeq [98], and the results were visualized on Circos package (v0.69) [108].

Phylogenetic analysis

The mitogenomes of 24 other plants were downloaded from the NCBI database to analyze the evolutionary position of F. ussuriensis among these species. PhyloSuite (v.1.2.2) was used for extracting the common PCGs among the mitogenomes of the selected species sequences [109], and MAFFT (v7.471) was utilized for aligning the extracted sequences [110]. Based on the maximum likelihood (ML) method, the phylogenetic tree was constructed by Mega (v7.0) with the GTR + G model and 1000 bootstrap replicates [106]. Furthermore, the heatmap of PCGs distribution was created using TBtools (v1.18) [111].

Nucleotide diversity (pi) analysis

The MAFFT software (v7.471) was employed for the global comparison of homologous gene sequences across diverse species [110], while DnaSP (v5.0) was utilized to calculate the Pi value corresponding to each gene [112].

Colinear analysis

The mitochondrial genomes of F. ussuriensis, L. tsingtauense, and 4 species of Asparagales were utilized to perform a colinear analysis by sequence similarity using the BLASTN software. The following parameters were applied: -evalue 1e-5, -word_size 9, -gapopen 5, -gapextend 2, -reward 2, -penalty -3 [100]. TBtools was used to generate a multiple synteny plot [111].