Introduction

The endosymbiosis theory, currently the most commonly accepted hypothesis to explain the origin of organelles in eukaryotic cells, suggests that mitochondria were formed by the invasion of host cells resembling archaea by α-proteobacteria, the ancestor of mitochondria, approximately 1.5 billion years ago and slightly before chloroplasts (Gupta and Golding 1996). Since the first sequencing of a mitochondrial genome in a land plant (Marchantia polymorpha) in 1992, the National Center for Biotechnology Information (NCBI) has collected 327 mitogenomes from land plants (as of March 2022), far fewer than the 6104 chloroplast genomes that have been sequenced (Oda et al. 1992). To some extent, this lag reflects the difficulty and complexity of extracting plant mitochondrial DNA and of genome assembly, but as the cost of sequencing decreases and assembly techniques improve, the number of plant mitogenomes should soon increase rapidly.

Plant mitogenomes are typically species specific, with large differences in size and structure among different species (Alverson et al. 2010). The mitogenomes of some plants contain a high number of introns and undergo trans-splicing after transcription (Handa 2003; Malek et al. 1997). In addition, plant mitogenomes are characterized by a wide range of RNA editing sites and universal genetic code usage (Gray and Covello 1993; Kubo et al. 2000). In contrast to the relatively stable genome size of animal mitochondria and plant chloroplasts, the size of mitogenomes in higher plants varies greatly, even among species of the same genus, from 66 kb in Viscum scuruloideum (Santalaceae) to 11.3 Mb in Silene conica (Caryophyllaceae) (Skippington et al. 2015; Sloan et al. 2012) and from 735.17 kb in Salix cardiophylla (Salicaceae) to 562.92 kb in Salix polaris (Chen et al. 2020a, b). This difference in mitogenome size is due in large part to the insertion of foreign DNA and extensive replication of noncoding regions (Bergthorsson et al. 2003; Wynn and Christensen 2019). The number of genes in plant mitogenomes also varies, ranging from 19 (Viscum album) to 221 (Capsicum annuum, Solanaceae) (Petersen et al. 2015; Jo et al. 2011). Although plant mitogenomes vary in size and number of genes, the genes encoding ATP synthase, NADH dehydrogenase, coenzyme Q-cytochrome c reductase, and cytochrome c oxidase are relatively conserved (Bi et al. 2020). Other genes, such as those encoding succinate dehydrogenase and ribosomal proteins, are not conserved and have been lost on occasion during plant evolution. For example, sdh3 has been lost in all studied species of Saliaceae, and rps11 has also been frequently lost in many angiosperms (Bi et al. 2016). Although the content of these genes in different species is not conserved, the gene sequence is relatively conserved (Allen et al. 2007; Chaw et al. 2008; Goremykin et al. 2009). Structurally, in contrast to that of plant chloroplasts and animal mitochondria, the structure of plant mitochondria has a variety of conformations, with not only the presence of a common single circular structure, but also linear structures and multiple chromosomal conformations. For example, the mitogenome of Lactuca saligna forms a branched linear structure (Kozik et al. 2019); the Cucumis sativus mitogenome has three independent chromosomal structures (Alverson et al. 2011); the mitogenome of Scutellaria tsinyunensis has a double-ring crossover structure (Li et al. 2021); and that of Lactuca sativa appears to undergo a dynamic change as evidenced by cryoelectron microscopy (Kozik et al. 2019). These features also indicate the instability of the plant mitogenome structure and make the assembly of mitogenomes in higher plants more difficult.

Populus deltoides (section Aigerios) is an important timber species with a natural distribution from southern Canada to the southeastern United States (Fahrenkrog et al. 2017). The trees are tall, leafy and geographically adaptable, and their wood is an important raw material for wood-based panels and papermaking. Cultivar I-69 is one of the most important cultivars of P. deltoides, with characteristics such as rapid growth and resistance to brown spot disease, and has been promoted worldwide and widely used for afforestation in China (Bai et al. 2021). Therefore, the study of P. deltoides I-69 at the genomic level is of both theoretical and practical importance.

At present, the nuclear and chloroplast genomes of P. deltoides have been assembled and annotated, but its mitogenome has not been studied. In this study, the first P. deltoides mitogenome was assembled and annotated by combining Nanopore and Illumina sequencing data using P. deltoides I-69 as the material. The gene content, codon usage bias, repeat sequences, RNA editing sites, genome collinearity and gene transfer between different genomes were analyzed, and phylogenetic analyses were performed based on conserved mitochondrial genes. The results of this study further support the existence of multiple conformations of the plant mitogenome and provide valuable information for the further study of the phylogenetic status of P. deltoides.

Materials and methods

Plant materials, DNA extraction and sequencing

P. deltoides I-69 branches were collected at Siyang Farm in Siyang County, Jiangsu Province, China (33 °40′ N, 118 °41′ E), and propagated in soil at the Xiashu Forestry Farm, Jurong City, Jiangsu Province, China (32 °7′ N, 119 °12′ E). Fresh leaves were collected and kept at – 80 °C in a freezer. Genomic DNA was extracted from frozen leaves using the CTAB method (Arseneau et al. 2017). The purity and concentration of DNA were determined using Qubit and NanoDrop (Thermo Fisher Scientific), respectively. Size selection (> 20 kb) was performed using 15 μg of genomic DNA and the Blue Pippin System (Sage Science, Beverly, MA, USA), and PCR-free libraries were constructed using the standard protocol of Nanopore (Oxford Nanopore, Oxford, UK). Libraries prepared in two flow cells were sequenced at Shanghai Biozeron Biotechnology Co., Ltd and Nextomics Biosciences Co., Ltd for Nanopore sequencing. The original signal data were based on ONT MinKNOW software for base detection, and the statistics for the generated Nanopore sequencing data were calculated using Nanostat software (v1.4.2) (De Coster et al. 2018).

In addition, for leaf samples from the same individual, the whole genome was sequenced using the Illumina HiSeq 2000 sequencing platform by Biomarker Technologies Corporation, Beijing, China. To correct contigs assembled from Nanopore sequencing data, these sequencing data were processed using the NGS QC Toolkit (v2.3.3) with default parameters to obtain high-quality reads (Patel and Jain 2012).

Mitogenome assembly and annotation

De novo assembly of the P. deltoides mitogenome was performed using the Oxford Nanopore data obtained by sequencing in this study and the wtdbg2 software (Ruan and Li 2020). The contigs obtained after preliminary assembly were polished by Pilon software (v1.18) in combination with Illumina sequencing data for error correction (Walker et al. 2014).

The complete mitogenome of P. deltoides was annotated using GE-Seq (Tillich et al. 2017), and the mitogenome of Populus tremula (NC_028096.1) was used as the reference genome. Protein-coding genes (PCGs) of the P. deltoides mitogenome were manually compared with those in the mitogenomes of other Populus species, then manually corrected using MacVector (Bernt et al. 2013). The tRNAscan-SE and default parameters were used to predict tRNA genes (Chan and Lowe 2019; Chan et al. 2021). rRNA genes were identified using RNAmmer (v1.2) and manually corrected with local BLASTn (Camacho et al. 2009; Lagesen et al. 2007). The OGDRAW web server was used to visualize the complete multi-circular P. deltoides mitogenome (Greiner et al. 2019).

Sequence repeat analysis

Simple sequence repeats (SSRs) were analyzed using the MISA program (Beier et al. 2017), and the SSR motif size thresholds for 1 − 6 nucleotides were set to 8, 4, 4, 3, 3, and 3. Tandem repeat sequences in P. deltoides mitogenome were analyzed by Tandem Repeats Finder (v4.09; Benson 1999) with default parameters. Finally, the size and position of the dispersed repeats, including forward repeats, palindromic repeats, reverse repeats and complement repeats, were detected using Reputer (Kurtz et al. 2001); the minimum repeat length was set to 30 bp, the Hamming distance was set to 3, and the repeats containing the overlap regions were manually verified and merged.

Codon usage analysis and RNA editing site prediction

The relative synonymous codon usage (RSCU) of P. deltoides was calculated using MEGA X (Kumar et al. 2018). The amino acid compositions of PCGs of P. deltoides, P. tremula, Salix suchowensis, Gossypium raimondii, Arabidopsis thaliana, Ginkgo biloba and Cycas taitungensis were analyzed using a local Perl language script (Kumar et al. 2016).

RNA editing sites in the mitogenome of P. deltoides and four other species (S. suchowensis, P. tremula, G. biloba and Pinus taeda) were predicted by PREP-Mt online software (Mower 2009) with a C value set to 0.2.

Multigenomic alignment

Nucleotudes of the P. deltoides mitogenome and chloroplast genomes (NC_040929.1) were aligned using BLASTn (version 2.9.0) with E-value ≤ 1e − 10, matching rate ≥ 80%, and length ≥ 40 bp.

To explore the mitogenome collinearity relationship between different poplars, we aligned the nucleotides of the P. deltoides and P. tremula mitogenomes using BLASTn with E-value ≤ 1e − 10, matching rate ≥ 80%, and length ≥ 1000 bp. The synteny circle plot for the above two analyses was plotted with Tb tools (Chen et al. 2020a, b).

Phylogenetic analysis

The genes for 43 species (see Supplementary Table S1 for all NCBI accessions) were first extracted using a local Perl script to build a matrix from which a total of 20 conserved genes were selected, then integrated and linked by another local Perl language program. Then, multisequence alignment was performed with the default parameters of Muscle (Edgar 2004). Finally, IQ-TREE (Minh et al. 2020) was used to construct the maximum likelihood-based phylogenetic tree, and the best model was determined by Model Finder (Kalyaanamoorthy et al. 2017; Minh et al. 2020) as GTR + F + R10 with bootstrap set to 1000 times, using two gymnosperms (G. biloba and P. taeda) as outgroups.

Results and discussion

Assembly and annotation of the P. deltoides mitogenome

A total of 44.3 Gb of Nano pore sequencing data were generated, containing 2.7 million reads with an average length of 16 kb and N50 of 25 kb, which is approximately 100 × of the P. deltoides genome. A total of 235.4 million Illumina reads with 23.8 Gb bases were generated. All sequencing-generated Nano pore data were used for de novo assembly, and ~ 55 × Illumina sequencing reads were used for error correction to ensure the accuracy of the results. Finally, the complete P. deltoides mitogenome was harvested.

The total length of the P. deltoides mitogenome is 802637 bp, which is similar to that of P. tremula (783,442 bp) and P. alba (838,420 bp) (Kersten et al. 2016; Brenner et al. 2019). However, unlike the common plant monocyclic mitogenome structure, the P. deltoides mitogenome was assembled into three circular chromosomal structures with lengths of 336,205 bp (Chr1), 280,841 bp (Chr2), and 185,591 bp (Chr3) (Fig. 1), and the three chromosome sequences were submitted to NCBI Genome Database (MZ951174.1, MZ951175.1, MZ951176.1). The nucleotide composition of the complete P. deltoides mitogenome was 27.67% A, 22.36% C, 22.42% G, and 27.56% T, with a GC content of 44.77% (Table 1). The GC contents of the three independent chromosomes were approximately the same: 44.89% (Chr1), 44.65% (Chr2), and 44.75% (Chr3).

Fig. 1
figure 1

Circular map of Populus deltoides mitogenome Chr1, Chr2 and Chr3. The genes on the outside are transcribed in the clockwise direction and those inside the circle in counterclockwise direction. GC content is represented by the dark gray plot in the inner circle. Genes with an asterisk (*) have introns

Table 1 Genomic features of the Populus deltoides mitogenome

The 58 genes that were identified in the mitogenome of P. deltoides included 34 PCGs, 21 tRNA genes and 3 rRNA genes (Table 2). There were 28 genes on Chr1 (13 complete PCGs, 11 tRNA genes and all 3 rRNA genes), 22 genes on Chr2 (13 complete PCGs, 7 tRNA genes), and 10 on Chr3 (6 complete PCGs and 3 tRNA genes). Interestingly, we found that the exons of the gene nad1 were distributed on Chr1 and Chr2, and the exons of nad5 were distributed on Chr2 and Chr3. In addition, three copies of the trnM-CAU gene were found, all on Chr1, and two copies of the trnP-UGG gene were found, all on Chr2.

Table 2 Gene annotation of the Populus deltoides mitogenome

As shown in Table 1, the total length of the PCGs in the P. deltoides mitogenome was 30,483 bp, accounting for 3.80% of the entire mitogenome; the total length of tRNA genes and rRNA genes accounted for 0.20 and 0.67%, respectively; cis-spliced introns accounted for 3.37%, and the total length of noncoding regions was 738,125 bp, accounting for 91.96%. Noncoding sequences occupied more than 80% of the mitogenome in most plants. These noncoding region sequences consisted of repetitive fragments, sequences transferred from chloroplast and nuclear genomes, and even sequences from other species obtained by horizontal gene transfer, such as the mitogenome of the oldest relict angiosperm Amborella trichopoda, in which a large number of sequence fragments from mosses, green algae, and other plants are present (Rice et al. 2013). Many of these sequences were relatively short, repetitive fragments, which is an important reason for the very easy recombination and rearrangement of plant mitogenomes (Alverson et al. 2011; Rice et al. 2013).

Codon usage analysis of PCGs

ATG is a typical start codon in plant mitochondria, but plant mitochondria may also use several other different start codons (Bock et al. 1994; Dong et al. 1998). The full length of PCGs of P. deltoides is 30483 bp. Most of these PCGs have a typical ATG start codon, but the start codon of mttB is ATT and GGG for rpl16, which may be the result of RNA editing. In addition, three different stop codons were identified in the PCGs, namely, TAA (atp1, atp6, cox1, cox2, nad1, nad2, nad3, nad4L, nad5, nad9, rps1, rps3, rps4, rps7, rpl2, rpl10, rpl16, and sdh4), TAG (atp4, atp8, atp9, ccmFc, matR, mttB, nad6, nad7, and rps14) and TGA (ccmB, ccmC, ccmFn, cob, cox3, nad4, and rps12) (Supplementary Table S2). As shown in Fig. 2, we analyzed the codon use of P. deltoides, P. tremula, S. suchowensis, G. raimondii, A. thaliana, G. biloba, and C. taitungensis. The results showed that the three most commonly used amino acids in plant mitochondria were leucine (Leu), serine (Ser), and arginine (Arg), while methionine (Met) and tryptophan (Trp) were rarely used (Handa 2003; Chaw et al. 2008; Bi et al. 2016; Guo et al. 2016; Kersten et al. 2016; Ye et al. 2017).

Fig. 2
figure 2

Codon usage pattern in the Populus deltoides mitogenome compared with those of P. tremula, Salix suchowensis, Gossypium raimondii, Arabidopsis thaliana, Ginkgo biloba and Cycas taitungensis. The fraction of each amino acid residue in all mitochondrial proteins is given by the y-axis

We found that the amino acid composition of mitochondrial proteins in plants of Salicaceae was very similar. In comparison with Arabidopsis thaliana and Gossypium, Populus deltoides had a similar amino acid composition. In the gymnosperms G. biloba and C. taitungensis, the use frequency of Gly, Pro and Arg was significantly higher than that of the angiosperms examined. Proline and arginine are involved in many stages of plant growth and development, especially in physiological and biochemical processes such as stress resistance. Glycine can also improve plant stress resistance, which may explain the longer lifespan of gymnosperms. The use frequency of lysine (Lys) and leucine (Leu), which promote photosynthesis and regulation in G. biloba and C. taitungensis, is significantly lower than that of the above five angiosperms, which may be why G. biloba and C. taitungensis, as gymnosperms, grow slower than angiosperms (Chaw et al. 2008; Guo et al. 2016;). Except for the above amino acids, the distribution of other amino acids between angiosperms (P. deltoides, P. tremula, S. suchowensis, G. raimondii, and A. thaliana) and gymnosperms (G. biloba and C. taitungensis) is relatively conserved.

For the relative probability of the occurrence of specific codons between synonymous codons that encode corresponding amino acids (RSCU), RSCU = 1 indicates that there is no preference for codon use, while RSCU > 1 indicates that the codon is a relatively frequently used codon (Yang et al. 2021; Sau et al. 2006). Figure 3 shows the RSCU analysis of the mitogenome of P. deltoides, and we found that all codons are present in PCGs.

Fig. 3
figure 3

RSCU values of the Populus deltoides mitogenome. The x-axis indicates the codon families

As shown in Fig. 3, the RSCU values of almost all NNA and NNT codons were greater than or equal to 1.00 except for Leu (CUA, 0.99), Ile (AUA, 0.92), Ser (UCA, 0.97; AGU, 0.94) and Arg (CGU, 0.88). This phenomenon indicates high A/T content at the third codon position in the mitogenome of P. deltoides, which is very similar to what has been reported in the mitogenomes of other land plants. The strong AT bias of the third codon is considered to be a universal phenomenon in higher plants (Bi et al. 2020; Wang et al. 2021; Yang et al. 2021).

Analysis of repeat regions

Compared with chloroplasts, plant mitochondria are characterized by a large distribution of repetitive sequences, and most of the size differences and rearrangements among plant mitogenomes can be explained by the size and arrangement of repetitive sequences, which play an important role in the evolution of plant mitochondria (Tanaka et al. 2014; Wynn and Christensen 2019; Yang et al. 2021).

SSRs are highly variable tandem repeat fragments of 1–6 bases, which are more frequent in plant genomes and are widely used as DNA markers for species identification and genetic diversity research ( Ma et al. 2017; Li and Ye 2020). A total of 714 SSRs were detected in the mitogenome of P. deltoides (Table 3), including 302 mononucleotide repeats (42.3%), 277 dinucleotide repeats (38.80%), 27 trinucleotide repeats (3.78%), 87 tetranucleotide repeats (12.18%), 18 pentanucleotide repeats (2.52%) and 3 hexanucleotide repeats (0.42%). Among them, more than 80% were mononucleotide and dinucleotide repeats, comparable to the findings for a variety of species such as S. suchowensis, G. raimondii, and Gleditsia sinensis (Xu et al. 2017). Further analysis showed that 90.06% of the mononucleotide SSRs were A or T. A higher A/T content in SSRs may lead to higher AT abundance in the mitogenome of P. deltoides, which is consistent with the results for many other angiosperms (Bi et al. 2020; Yang et al. 2021). Our results provide valuable reference information for molecular marker development and population variation studies of P. deltoides.

Table 3 Frequency of identified SSRs in the Populus deltoides mitogenome

As shown in Table 4, 14 tandem repeats were identified in the mitogenome of P. deltoides, with unit lengths ranging from 11 to 33 bp, of which 7 were distributed on Chr1, 3 on Chr2, and 4 on Chr3. Further analysis showed that 13 of them were located in the intergenic region, and only one tandem repeat sequence on Chr1 was located in a coding region (rrnL, [CATAGTCGCGAGCTGTTT] × 2). The large distribution of SSRs and tandem repeats in intergenic regions indicates that repetitive sequences do not affect the transcription, translation and functional expression of coding genes (Bi et al. 2016).

Table 4 Distribution of tandem repeats in the P. deltoides mitogenome

According to previous descriptions, in addition to SSRs and tandem repeats, there are a large number of dispersed repeat sequences with repeat units larger than 30 bp in the plant mitogenome, which are considered to be long repeat sequences, including forward repeats (F), palindromic repeats (P), reverse repeats (R) and complement repeats (C) (Bi et al. 2016). In this study, 428 dispersed repeats larger than 30 bp were found in the mitogenome of P. deltoides, with a total length of 18,863 bp, accounting for 2.35% of the total length of the mitogenome. Among the 428 repeat sequences were 212 forward repeats, 215 palindromic repeats and 1 reverse repeat. No complement repeats were found.

As shown in Fig. 4, most of the dispersed repeat sequences in the mitogenome of P. deltoides are between 30 and 39 bp (272, 63.55%). The proportion of repeat sequences in regions of this length was significantly higher than in regions of other lengths, consistent with the findings in mitochondria of many plants that have been studied (Bi et al. 2016, 2020; Yang et al. 2021). Twenty-three repetitive sequences were longer than 100 bp, and only one was longer than 500 bp (550 bp, Chr1). Compared with other species of Salicaceae (P. davidiana, P. tremula, Salix suchowensis, S. brachista), the number of dispersed repeat sequences in Populus was significantly higher than in Salix, and the number of repeat sequences was well conserved within the same genus. Further analysis showed that the proportion of repeat sequences of most different length segments was very similar among Salicaceae species. The dispersed repetitive sequences in Salicaceae are generally less than 500 bp long, and the only ones larger than 500 bp are a 550-bp repetitive sequence in P. deltoides and a 15,592-bp sequence in S. suchowensis (Ye et al. 2017). Such long repeat sequences have been found in several species. These long repeats may be associated with subgenomic conformations or isomers in plant mitogenomes (Dong et al. 2018; Bi et al. 2020) and deserve attention for their impact on the structure of the plant mitogenome and on expansion of the mitogenome size in plants (Chang et al. 2013; Dong et al. 2018).

Fig. 4
figure 4

Frequency distribution of dispersed repeats in the Populus deltoides mitogenome compared with four other Salicaceae plants

Prediction of RNA editing sites

Previous studies have shown that RNA editing events are among the necessary steps to regulate gene expression in plant growth and development and are also a common phenomenon in chloroplasts and mitochondria (Bock and Khan 2004; Chen et al. 2011; Raman and Park 2015). In this study, 330 RNA editing sites of 34 PCGs in P. deltoides were predicted by the PREP-MT program, and analysis showed that all were C-U edits and occurred in the first (119, 36.06%) and second (207, 62.73%) positions of the codon, whereas no RNA editing was found in the third position, similar to the case in most angiosperms (Grewe et al. 2014; Kovar et al. 2018; Bi et al. 2020; Li et al. 2021; Yang et al. 2021). Interestingly, four codons in the PCGs of P. deltoides mitochondria underwent C-U editing (CCT → TTT) at both the first and second positions, which has not been observed in other Salicaceae species. RNA editing can change the start and stop codons of PCGs; for example, the start codon of mttB in P. deltoides is ATT, which may be caused by RNA editing, but this was not predicted by the PREP-MT program. Some studies have pointed out that the failure of PREP-MT to predict the third codon for RNA editing may result from the algorithmic rules of PREP-MT. The main limitation of PREP-MT is that it is unable to recognize silent editing sites. Because most of the editing of the third codon does not change the amino acids encoded by the protein, PREP-MT cannot always predict the outcome (Mower 2009). Therefore, the identification of this particular case requires relevant biological experiments to obtain accurate results.

RNA editing sites were predicted for three Salicaceae species (P. deltoides, P. tremula, S. suchowensis) and two gymnosperms (G. biloba, P. taeda) with 330, 327, 328, 1291 and 1159 sites, respectively. As shown in Fig. 5, the number of RNA editing sites varied greatly among different genes in the P. deltoides mitogenome, with the most RNA editing sites occurring in the Complex I genes and the cytochrome c biosynthesis gene, while no RNA editing was found in atp1, atp8, atp9, cox1, cox2, or rps1.

Fig. 5
figure 5

The distribution of RNA editing sites in the mitogenomes of three Salicaceae plants and two gymnosperms (Ginkgo biloba and Pinus taeda)

According to the predicted number of RNA editing sites, the number of RNA editing sites in Salicaceae is extremely conserved, while the number of RNA editing sites in G. biloba and P. taeda is much larger than in Populus and Salix species. Analysis of RNA editing of PCGs common to the five species revealed that genes (atp1, atp8, cox1, cox2, etc.) with no RNA editing events in Populus and Salix had a large number of RNA editing sites in both G. biloba and P. taeda. Further studies revealed that the number of RNA editing loci in angiosperms was mostly within 500 and concentrated between 300 and 500, while in gymnosperms, it was significantly greater than that in angiosperms. In addition to 1291 RNA editing sites in G. biloba and 1159 in P. taeda, past studies also predicted 1084 RNA editing sites in the mitochondrial PCGs of C. taitungensis (Chaw et al. 2008; Bi et al. 2020), The reason for this phenomenon may be that as land plants diverged from early stages and more-advanced angiosperms evolved, angiosperms lost editing sites due to random mutations and reverse transcription of genes, generating a more stable genome structure to provide intrinsic conditions for adaptation and resistance to various environmental changes (Rudinger et al. 2012).

In addition, we found that most of the RNA-edited amino acids (251, 76.06%) were hydrophobic, and this hydrophobicity is the main force driving protein-folding; thus, such RNA editing enables better protein function and thus better gene expression (Galtier 2011; Brenner et al. 2019).

Sequence transfer and collinearity between different genomes

The horizontal transfer of DNA fragments between organellar genomes and the nuclear genome is an important event in plant evolution. Current genome data and experimental studies show that the main direction of sequence transfer between the three genomes is from the organellar genomes to the nuclear genome and from the nuclear genome and chloroplast genome to the mitogenome. In higher plants, sequences of the nuclear genome and mitogenome have not been found to be transferred to the chloroplast genome (Bergthorsson et al. 2003; Leister 2005; Martin et al. 1998).

To better understand the horizontal sequence transfer events in P. deltoides, we used the mitogenome as a query and compared it to the nuclear genome and the chloroplast genome separately. The results showed that 830 fragments, totaling 471 kb of nuclear genome sequences, were transferred to the mitogenome in P. deltoides. Further analysis showed that each chromosome in the P. deltoides nuclear genome had sequences transferred to the mitogenome, with chromosome 18 having the longest transfer sequence (77,364 bp), much longer than those of the other chromosomes; the smallest transfer sequence was on chromosome 14, only 6241 bp. In addition, statistical analysis of the lengths of the transferred fragments showed that the fragments transferred from the nuclear genome to the mitochondria were mainly in the range of 200 − 500 bp (674, 81.20%), similar to the findings for other angiosperm mitogenomes (Bi et al. 2016). In addition, four fragments were greater than 10,000 bp, with the longest being 20,520 bp. The gene sequences were aligned with those of the P. deltoides nuclear genome, which shares 59,788 bp of gene sequences with the mitogenome, including one complete gene sequence (trnN-GUU) and partial sequences of 25 genes (atp1, atp4, atp6, atp8, nad1, nad2, nad4, nad4L, nad5, nad7, cox1, cox2, cox3, cob, ccmFc, sdh4, matR, rps1, rps3, rps4, rrnL, trnC-GCA, trnY-GUA, trnE-UUC, and trnD-GUC).

As shown in Fig. 6, comparison of the mitochondrial and chloroplast genomes of P. deltoides showed that a total of 51 fragments (32.97 kb) were transferred from the chloroplast to the mitogenome, accounting for 21.01% and 4.11% of the chloroplast and mitogenomes, respectively, as detailed in Supplementary Table S3. According to the annotation results of chloroplasts and mitochondria of P. deltoides, 27 intact genes were transferred to mitochondria in the chloroplast genome. In addition to 14 PCGs, 13 tRNA genes in the chloroplast of P. deltoides had migrated into the mitogenome. Past studies have shown that tRNAs in plant mitochondria have a dual origin; i.e., one part of tRNA is inherited from the ancestor of the mitochondria, and the other part is derived from horizontal gene transfer from chloroplasts (Sprinzl and Vassilenko 2005). There are 18 different tRNAs in the mitogenome of P. deltoides, and eight are from chloroplast gene transfer, accounting for 44.44%, similar to the values in G. raimondii, Boea hygrometrica and some other species (Zhang et al. 2011; Bi et al. 2016). Further comparison with the gymnosperms G. biloba and C. taitungensis showed that less than 20% of tRNAs in the chloroplast genomes of G. biloba and C. taitungensis migrated to mitochondria, significantly less than for many angiosperms that have been studied. No chloroplast-to-mitochondria tRNA gene transfer was found in Marchantia polymorpha, the first plant to assemble mitochondria (Oda et al. 1992; Chaw et al. 2008; Guo et al. 2016). This phenomenon suggests that the transfer of tRNAs from chloroplasts to mitochondria is becoming more frequent as plants evolve from lower to higher levels, probably because lower plants have a complete set of tRNA genes, but as plants evolve, their own tRNAs in the mitochondria of higher plants are being lost and replaced by homologous tRNAs of chloroplast origin to meet the needs of both chloroplast and mitochondrial protein synthesis (Glover et al. 2001).

Fig. 6
figure 6

Collinearity of the mitogenome and chloroplast genome of Populus deltoides. In the purple region, identity is greater than 95%, in the green between 90 and 95%, and in the blue region between 85 and 90%

To further explore the similarities and differences between the multiring structure and single-ring structure of the mitogenome of Populus, the mitogenomes of P. deltoides and P. tremula (783.44 kb) were compared and subjected to collinear analysis. Figure 7 shows that almost all fragments of the mitogenome of P. tremula mapped to the three circular chromosomes of P. deltoides. However, the distribution order and direction of many large collinear fragments are different. Interestingly, the collinear analysis of P. tremula and P. davidiana showed that the two species had a good collinear relationship, and the sequence fragments were basically the same in order and direction (Supplementary Fig. S1), perhaps because P. deltoides and P. tremula do not belong to the same section and the mitogenome of poplar appears to be extensively rearranged between sections (Yang et al. 2016; Cole et al. 2018).

Fig. 7
figure 7

Collinearity of the mitogenomes of Populus deltoides and P. tremula

Phylogenetic analysis

Because the plant mitogenome also carries relevant genetic information, it is increasingly used for the study of phylogeny. With the improvement of sequencing and assembly technology, increasing attention has been given to the phylogenetic analysis of plants using mitochondrial information (Bi et al. 2016; Choi et al. 2019; Yang et al. 2021). To determine the phylogenetic location of P. deltoides, 42 mitogenome data sets from among eudicots, monocots, magnoliids and gymnosperms, were obtained from the NCBI genome database (Supplementary Table S2). Based on 20 single-copy homologous genes shared by 43 species (atp1, atp4, atp6, atp8, atp9, ccmB, ccmC, ccmFc, ccmFn, cob, cox1, cox2, cox3, nad2, nad3, nad4, nad4L, nad6, nad7, and nad9), a maximum likelihood (ML) tree was constructed with gymnosperms G. biloba and P. taeda as outgroups. As shown in Fig. 8, most of the 40 branch nodes in the phylogenetic tree have bootstrap values above 90% (34, 85%), and 26 of them have bootstrap values of 100%. It can be seen from the ML tree that the mitochondrial PCGs can perfectly distinguish rosids, asterids, monocots, magnoliids and gymnosperms and classify each order, family and genus clearly, which is consistent with the results of the APG IV classification system (Byng et al. 2016) (APG 2016). Among the phylogenetic trees, the genera Populus and Salix under the family Salicaceae are distinguished by a 100% bootstrapping rate. Further study found that within the genus Populus, P. davidiana, and P. tremula clustered together but are more closely related to P. alba, while P. deltoides is in a separate branch. The reason for this is that Populus is widely distributed and has many species. P. davidiana, P. tremula, and P. alba all belong to Populus sect. Populus, so the genetic relationship is relatively close, while P. deltoides belongs to Populus sect. Aegiros, so it is more distantly related to the other three species, which is well reflected in the evolutionary tree. The above results also confirm the reliability of the ML tree based on plant mitochondrial PCGs, and with the rapid development of plant mitogenome sequencing and assembly technologies, extensive evolutionary research on plants through mitochondria will become possible.

Fig. 8
figure 8

Maximum-likelihood phylogenetic tree based on 20 single-copy orthologous genes shared among 43 species. Ginkgo biloba and Pinus taeda served as outgroups. The numbers on each node are bootstrap support values

Conclusion

We assembled and annotated the mitogenome of P. deltoides I-69 and systematically analyzed it based on the obtained genome and annotation information. The P. deltoides I-69 mitogenome consists of three circular chromosome structures with a genome size of 802,637 bp, which is similar to the mitogenomes of other poplars that have been studied. The 58 genes were distributed on three chromosomes, and the number of genes on each chromosome was positively correlated with the size of the chromosome. RNA editing and codon usage analysis showed that the mitogenome of poplar was relatively conserved. Further analysis of Arabidopsis thaliana, Gossypium raimondii and other angiosperms and gymnosperms, such as Ginkgo biloba and Cycas taitungensis, showed significantly fewer RNA editing sites in the mitogenomes of poplar and other angiosperms than in those of gymnosperms, such as Ginkgo biloba. Phylogenetic analysis of the mitogenome of P. deltoides and 42 other species based on 20 common PCGs showed that the evolutionary status of this species could be resolved well, and the section of Populus was verified at the genome level. Sequence transfer and collinear analyses showed extensive rearrangements among different sections in Populus, which provided new clues and evidence for the study of Populus phylogeny. In addition, tRNA translocation analysis revealed that as plants evolved from lower to higher levels, tRNAs in chloroplasts were transferred to mitochondria more frequently. This study is the first to reveal the multichromosomal conformation of mitochondria in Populus, elucidating more details on the phylogenetic relationships of Populus species and on genetic information for the study of Populus species.