Comparative chloroplast genomics reveals the phylogeny and the adaptive evolution of Begonia in China

Xiong, Chao; Huang, Yang; Li, Zhenglong; Wu, Lan; Liu, Zhiguo; Zhu, Wenjun; Li, Jianhui; Xu, Ran; Hong, Xin

doi:10.1186/s12864-023-09563-3

Comparative chloroplast genomics reveals the phylogeny and the adaptive evolution of Begonia in China

Research
Open access
Published: 27 October 2023

Volume 24, article number 648, (2023)
Cite this article

Download PDF

You have full access to this open access article

BMC Genomics Aims and scope Submit manuscript

Comparative chloroplast genomics reveals the phylogeny and the adaptive evolution of Begonia in China

Download PDF

Chao Xiong¹^na1,
Yang Huang³^na1,
Zhenglong Li²,
Lan Wu⁴,
Zhiguo Liu¹,
Wenjun Zhu¹,
Jianhui Li⁵,
Ran Xu¹ &
…
Xin Hong²

1237 Accesses
2 Citations
Explore all metrics

Abstract

Background

The Begonia species are common shade plants that are mostly found in southwest China. They have not been well studied despite their medicinal and decorative uses because gene penetration, decreased adaptability, and restricted availability are all caused by frequent interspecific hybridization.

Result

To understand the patterns of mutation in the chloroplast genomes of different species of Begonia, as well as their evolutionary relationships, we collected seven Begonia species in China and sequenced their chloroplast genomes. Begonia species exhibit a quadripartite structure of chloroplast genomes (157,634 − 169,694 bp), consisting of two pairs of inverted repeats (IR: 26,529 − 37,674 bp), a large single copy (LSC: 75,477 − 86,500 bp), and a small single copy (SSC: 17,861 − 18,367 bp). 128–143 genes (comprising 82–93 protein-coding genes, 8 ribosomal RNAs, and 36–43 transfer RNAs) are found in the chloroplast genomes. Based on comparative analyses, this taxon has a relatively similar genome structure. A total of six substantially divergent DNA regions (trnT-UGU-trnL-UAA, atpF-atpH, ycf4-cemA, psbC-trnS-UGA, rpl32-trnL-UAG, and ccsA-ndhD) are found in the seventeen chloroplast genomes. These regions are suitable for species identification and phylogeographic analysis. Phylogenetic analysis shows that Begonia species that were suited to comparable environments grouped in a small clade and that all Begonia species formed one big clade in the phylogenetic tree, supporting the genus’ monophyly. In addition, positive selection sites were discovered in eight genes (rpoC1, rpoB, psbE, psbK, petA, rps12, rpl2, and rpl22), the majority of which are involved in protein production and photosynthesis.

Conclusion

Using these genome resources, we can resolve deep-level phylogenetic relationships between Begonia species and their families, leading to a better understanding of evolutionary processes. In addition to enhancing species identification and phylogenetic resolution, these results demonstrate the utility of complete chloroplast genomes in phylogenetically and taxonomically challenging plant groupings.

View this article's peer review reports

Insights into chloroplast genome structure, intraspecific variation, and phylogeny of Cyclamen species (Myrsinoideae)

Article Open access 03 January 2023

Phylogenetic relationships, selective pressure and molecular markers development of six species in subfamily Polygonoideae based on complete chloroplast genomes

Article Open access 29 April 2024

Chloroplast genome sequence of Begonia asteropyrifolia and comparative analysis with other related species

Article 09 December 2023

Background

Sunlight is essential for the growth of most heliophilous plants, and in severe cases, they may even die due to a lack of light. There are, however, a variety of Begonia species that can adapt to a range of light levels and have a wide range of forms, they are widely distributed and attached to caves in humid and shady environments [1]. Listed among the top ten large angiosperm genera, Begonia is a member of the Begoniaceae family with over 2,000 described species [2]). In tropical and subtropical regions, Begonia diversity varies unevenly. American and Asian species are the most diverse, with each having over 600 species, while African species are relatively scarce with only 160 species, and Australian species are absent [3]. A total of 130 Begonia species are included in the Flora of China, which occurs naturally in the south of the Yangtze River basin. Because of its wide variety of morphological characteristics, Begonia is an excellent species for studying shade-adapted plants in China. Particularly distinctive are its asymmetrical leaves, monoecious blooms, and three-winged capsules, which are dry and papillose [4]. Begonia serves as a fantastic system for examining the processes and patterns underlying the production of biodiversity because it is a megadiverse, pantropically distributed genus [5]. Analyzing plastid genome (plastome) structure and repeat content was necessary in order to gain insight into how the plastid genome might play a role in species evolution.

As a means of addressing adaptive evolution in plants, the plastomes can be useful. Under different light intensities or different living environments, some genes will leave fingerprints in plastomes. These genes are associated with photosynthesis and genetic systems, which play an important role in helping plants adapt to various environments [6, 7]. In addition, comparative genomic analysis of chloroplasts can be used to identify areas of high variability and create specific molecular markers of populations or species for use in species identification. With the decrease in sequencing costs and the continuous improvement of sequencing technology in recent years, more and more plant plastomes have been successfully sequenced and applied to species identification and phylogenetic evolution studies. However, few chloroplast genomes have been published in Begonia, most of which have focused on biodiversity studies of species using DNA barcode fragments. To resolve section-level patterns of phylogenetic diversity in south American Begonia, for instance, Moonlight et al. [8] used three plastid regions (the ndhA intron, the ndhF-rpl32 spacer, and the rpl32-trnL spacer), but there was no robust resolution at the species level, which restricts the analysis of the adaptability of Begonia to the geographical environment. In order to comprehend the development of the chloroplast genomes and reconstruct the phylogenetic connections of Begonia, a comparative study of the chloroplast genome is of great utility.

In this study, we focus on the mechanisms of diversity formation and adaptive evolution in Begonia, which will be conducive to the study of evolution at different levels, from population to species to the whole genus. Here we present chloroplast genome information for seven different species of Begonia, which will serve as the foundation for a new reference sequence database and be used to map the genetic structure of the genus. These genetic resources provide tools for discovering the functional genetic basis of population variation in Begonia species and mapping adaptation analyses at the interspecific and population levels.

Results

Plastid genome features

In this study, a total of seven Begonia plants were collected, and their chloroplast genomes were sequenced and assembled. We discovered that all 17 Begonia cp. genomes (the other 10 cp. genome sequences are available from NCBI) have a circular DNA molecule and a typical quadripartite structure, which includes a small single copy region (SSC), a large single copy region (LSC), and two inverted repeat regions (IRa and IRb) (Fig. 1). Among all 17 Begonia species, the complete chloroplast genomes ranged from 157,634 bp (B. guangxiensis) to 169,694 bp (B. smithiana) in length (Table 1), and the length of the LSC, SSC, and IR regions is from 75,477 bp (B. cavaleriei) to 86,500 bp (B. guangxiensis), from 17,861 bp (B. emeiensis) to 18,367 bp (B. cathayana), from 26,529 bp (B. guangxiensis) to 37,674 bp (B. versicolor), respectively. The cp. genomes of Begonia have similar GC levels, B. leprosa had the lowest GC content (35.47%) and B. obsolescens had the greatest GC content (35.90%).

Table 1 Summary of the chloroplast genomes of seventeen Begonia species

Full size table

A total of 128–142 genes were identified based on gene annotation, of which comprised 82–93 protein-coding genes, 36–43 transfer RNAs (tRNAs), and 8 ribosomal RNAs (rRNAs) (Table 1), whereas the number of genes fluctuates between species as a result of IRs contraction and expansion. These genes were separated into three groups: 59 genes are related to self-replication (the large subunit of the ribosome, the small subunit of the ribosome, and RNA polymerase), 43 genes are involved in photosynthesis (photosystem I, photosystem II, cytochrome b/f complex, ATP synthase, Rubisco large subunit, and NADPH dehydrogenase), and other genes are associated with related enzymes (ATP-dependent protease, Maturase, Acetyl-CoA carboxylase, Cytochrome c biogenesis, and Inner membrane protein) (Table 2).

Table 2 Genes in the chloroplast genome of seventeen Begonia species

Full size table

In addition, the cp. genome had four borders among LSC, IRb, IRa, and SSC: the JLB line for the LSC/IRb border, the JSB line for the IRb/SSC border, the JSA line for the SSC/IRa border, and the JLA line for the IRa/LSC border. The borders of the seventeen cp. genomes of Begonia were compared (Fig. 2). At the LSC/IRb boundary, the rps19 and trnG genes were located at the JLB line in all species except B. leprosa and B. obsolescens, where the rps19 gene was completely located in the LSC region, 1–6 bp from the LSC/IRb boundary, due to the expansion of the LSC region boundary. The IRb/SSC boundary exhibited greater variability. In B. guangxiensis and B. pulchrifolia, the JSB line was located at the overlap of the ycf1 and ndhF genes. In B. obsolescens, the JSB line was located to the left of the ycf1 pseudogene, approximately 4 bp away, whereas in other species, the JSB line was located within the ycf1 pseudogene, extending 3–23 bp into the SSC region. The ycf1 genes were also observed at the SSC/IRa boundary and contained a segment ranging from 1,408 to 1,426 bp within the IRa region. At the IRa/LSC boundary, all species, except B. obsolescens and B. guangxiensis, the trnG and trnR genes located within the boundaries of the JLA line. There were 5–7 bp between trnR and the LSC/IRb borders. Additionally, the copy genes of trnG were found to be fully retained within the IRa region and positioned at a distance of 399–437 bp from the JLA line.

Repeat sequence analysis

In cp. genomes, repetitive sequences are essential for genome evolution and rearrangements. 897 SSRs were found in the genomes of the seventeen Begonia chloroplast genomes (Fig. 3 and Supplementary Table S1). The least number of SSRs (48) was found in the 17 chloroplast genomes of B. ferox, B. gulongshanensis, and B. versicolor. A total of 63 SSRs were discovered in the B. emeiensis chloroplast genome, which accounts for the majority of the 17 chloroplast genomes. For each species of Begonia, mononucleotide repeats ranged from 39 to 56, while dinucleotide repeats ranged from 5 to 7, tetranucleotide repeats ranged from 0 to 1, hexanucleotide repeats ranged from 0 to 1, trinucleotide repeats ranged from 0 to 1, and pentanucleotide repeats are 0.

Furthermore, oligonucleotide repeat analysis revealed that the seventeen cp. genomes have varying numbers of repeat types and random permutations. Most repeat sequences were within 30–36 bp (Supplementary Table S2). Meanwhile, palindromic repeat (P-repeat) and forward repeat (F-repeat) occurred more frequently than Reverse repeat (R-repeat) and Complement repeat (C-repeat). Supplementary Table S2 displays the structural analysis of the repeat sequence. B. pulchrifolia had the fewest repetitions (24), whereas B. umbraculifolia, B. arachnoidea, and B. gulongshanensis had the most repetitions (36). The examination of repeat sequences will be useful for understanding genetic diversity in Begonia.

Comparative chloroplast genome analysis

Multiple alignments of the genomes of 17 different species of Begonia were performed using B. coptidifolia as a reference for comparison. The results revealed that the non-coding regions had more divergence than the coding regions (Fig. 4). In the chloroplast genome alignment, we observed sixteen variant gaps, with 12 regions (atpH-atpI, atpF-atpH, trnC-GCA-petN, petN-psbM, ycf4-cemA, rbcL-accD, psbE-petL, petD-rpoA, rps19-trnG-UCC, trnT-UGU-trnL-UAA, trnE-UUC-trnT-GGU, and trnD-GUC-trnY-GUA) identified as the primary sources of divergence in the non-coding regions, while atpF, petL, ycf1, and ndhF were identified as the substantially divergent sequences for the coding regions (Fig. 5). Notably, the nucleotide sequences of these regions differed by more than 50% from those of the reference species, B. coptidifolia. These sequences may be useful for identifying the species of Begonia. We further investigated sequence variability by calculating nucleotide polymorphisms (Pi) among the 17 species of Begonia. The results revealed that the coding regions of the SSC region exhibited the highest average Pi values (Fig. 5A), followed by the coding regions of the IR and LSC regions (Fig. 5B C). Among the 12 spacer regions analyzed, the Pi values ranged from 0.01084 (petN-psbM) to 0.06525 (trnT-UGU-trnL-UAA) (Fig. 5D). The ten coding genes with the highest polymorphisms were rps19, rps3, rpl32, rpl22, ccsA, ndhE, atpF, petL, atpE, and rps8 (Pi > 0.01). Additionally, tRNA and rRNA gene Pi values were computed. The findings revealed that trnL-UAA had a higher Pi value, which was 0.01538. Finally, we identified six highly variable loci with Pi values ranging from 0.03 to 0.07. These loci include trnT-UGU-trnL-UAA (Pi = 0.06525), atpF-atpH (Pi = 0.03402), ycf4-cemA (Pi = 0.03353), psbC-trnS-UGA (Pi = 0.03245), rpl32-trnL-UAG (Pi = 0.03162), and ccsA-ndhD (Pi = 0.03027) (detailed information listed in Supplementary Table S3). These highly variable regions hold the potential to serve as molecular markers for DNA barcoding applications.

Phylogenetic relationship

In order to explore the phylogenetic position of this genus, ML trees were built using 76 protein-coding genes from the chloroplast genomes of 42 species, including 17 species of Begonia (Fig. 6), among which 35 of 42 species’ chloroplast genomes were obtained from NCBI (Supplementary Table S4). All nodes in the phylogenetic tree had bootstrap values that were more than 50% supported, and each genus formed a clade (bootstrap values of 100%). Two significant minor clades within the Begonia clade were separated with 100% bootstrap support. In one major small clade, B. coptidifolia and B. smithiana form a clade, and then sequentially form clades with B. emeiensis, B. pulchrifolia, B. cathayana, B. handelii, B. versicolor, and B. grandis. In another major clade, the clade formed by B. guangxiensis and B. ummbraculifolia clustered with B. cavaleriei, B. ferox, B. obsolescens and then shared a sister relationship with B. asteropyrifolia, B. arachnoidea, and B. gulongshanensis.

Adaptive evolution analysis

Using the BEB test, positive selection was discovered in eight genes (psbE, psbK, rpl2, rpl22, rpoC1, petA, rps12, and rpoB) with a high posterior probability (> 95%) in the 76 cp. genome protein-coding genes of seventeen species of Begonia (Fig. 7). RNA polymerase subunit coding genes are represented by the rpoC1 and rpoB genes. Two amino acid positions (224th and 566th codons) in the rpoC1 gene were identified as positive selection sites (Fig. 7A). The two sites were discovered through a spatial analysis to be in random coils (Fig. 8A). In rpoB, seven sites were found (Fig. 7B). As predicted by the protein structure, the majority of these positive selection sites were found in the α-helix, followed by random coils and the β-sheet (Fig. 8B). The psbE and psbK genes, which code for the photosystem II subunit, as well as the petA gene, which codes for the cytochrome b/f complex subunit protein in the photosystem II process, make up three of the eight genes. The 59th and 82nd amino acid positions in psbE, which were favorably chosen, were situated in the random coil and α-helix, respectively (Fig. 7C and Fig. 8C). The fifth amino acid location in Maturase, which is encoded by the psbK gene, was found to be positively selected (Fig. 7D). The locus was discovered through spatial analysis to be inside a random coil (Fig. 8D). The protein structure predicted that petA would have a positively chosen amino acid location (162nd) in the α-helix (Fig. 7E and Fig. 8E).

At the same time, the other three genes were protein synthesis-encoding genes, which were the ribosomal protein small subunit (RPS) rps12 gene; the rpl2 and rpl22 genes of the ribosomal protein large subunit (RPL). In the rps12 protein, two regions of positive selection (25th and 118th) were discovered. The rps12 protein under positive selection’s spatial analysis revealed that the location was in the α-helix (Fig. 7F and Fig. 8F). Figures 7G and 8G show that the rpl2 protein has a positive selection site (131st), Fig. 7H and Fig. 8H show that the random coil also contains three amino acid sites under positive selection in rpl22 (37th, 105th, and 129th).