1 Introduction

Saxifraga L. is the largest genus of Saxifragaceae, consisting of 450–500 species in at least 13 sections (Tkach et al. 2015; Ebersbach et al. 2017a; Gao et al. 2017; Li et al. 2018). The most species-rich section of Saxifraga is Saxifraga sect. Ciliatae Haw. (ca. 175 species) (Pan et al. 2001), whose diversity center is the Qinghai-Tibetan Plateau (QTP) and Himalayas (Gao et al. 2015; Ebersbach et al. 2018). In addition, Saxifraga sect. Ciliatae is supposed to have experienced recent rapid radiations in the QTP region, particularly in Saxifraga sect. Ciliatae subsect. Hirculoideae Engl. & Irmsch. (Gao et al. 2015; Ebersbach et al. 2017b, 2018). Rapid radiations are usually associated with low genetic divergence among closely related species and clades (DeChaine et al. 2013), and thus, phylogenetic relationships could not be well resolved based on a limited number of DNA markers. This is also the case in Saxifraga sect. Ciliatae. Concerning to Saxifraga sect. Ciliatae subsect. Hirculoideae, relationships of its ca. 110 species are largely unresolved based on traditionally universal DNA markers (Gao et al. 2015). Furthermore, enigmatic species from Saxifraga sect. Ciliatae subsect. Hirculoideae could not be identified based on universal barcodes. New genomic resources are urgent to assist phylogenetic reconstruction and taxonomic classification of Saxifraga sect. Ciliatae. Here, we report the complete chloroplast genomes of seven species in Saxifraga sect. Ciliatae.

Analyses of complete chloroplast genomes have the advantage to significantly improve the resolution of phylogenetic relationships in large, complex plant lineages (Jansen et al. 2007; Doorduin et al. 2011), even in enigmatic taxa (Dong et al. 2018). Chloroplast genomes are closed circular DNA molecules in most angiosperms, which have a conserved quadripartite structure consisting of a large single-copy (LSC) region and a small single-copy (SSC) region interspersed between a pair of inverted repeats (IR) (Palmer 1985; Wicke et al. 2011). The chloroplast genome size of angiosperms ranges from 120 to 160 kb, encoding 110–130 genes (Palmer 1985). Due to the characteristics of uniparental inheritance, haploid nature, conserved structure and gene content, small genome size, chloroplast genomes have been widely applied in phylogenetic reconstructions (Burke et al. 2016; Simmons 2017; Dong et al. 2018), molecular evolution (Huang et al. 2014; Walker et al. 2014) and super-barcoding studies (Hernández-León et al. 2013). With the recent advent of next-generation sequencing (NGS) techniques, the complete chloroplast genome sequences have grown rapidly. However, up to now, only one chloroplast genome of Saxifraga has been reported, Saxifraga stolonifera Curtis (GenBank accession no. NC_037882), a member of S. sect. Irregulares Haw. No chloroplast genome has been reported for the mostspecies-rich sect. Ciliatae of Saxifraga.

Saxifraga sinomontana J.-T. Pan & Gornall, belonging to S. sect. Ciliatae subsect. Hirculoideae is a prominent element of the high-altitude habitats throughout the QTP-Himalayas region. This perennial herb is extraordinarily variable in morphology and shows high level of intraspecific genetic diversity, which was considered to have experienced recent rapid intraspecific differentiation associated with Quaternary climatic oscillations (Li et al. 2018). Here, we de novo sequenced and assembled the complete chloroplast genome of S. sinomontana using an Illumina sequencing platform. Comparative analysis between this newly sequenced chloroplast genome and other six reported Saxifragaceae chloroplast genomes was implemented, and the putative performance for phylogenetic study on Saxifraga and Saxifragaceae was assessed.

2 Materials and methods

Sample collection, genome sequencing and assembly

– Fresh leaves of Saxifraga sinomontana were sampled in Xuebudala pass, Longzi xian, Xizang Autonomous Region, China (28°37′58.6″N, 92°13′09.2″E). Leaves were collected from a single individual, then dried in silica gel. Voucher specimen was deposited in the herbarium of Northwest Institute of Plateau Biology (HNWP), Xining, Qinghai, China.

Total genomic DNA was isolated from silica-dried leaves using the modified CTAB method (Doyle and Doyle 1987). Genomic library was then prepared using the TruSeq Library Construction Kit (Illumina, San Diego, California, USA) following the manufacturer’s instructions. Briefly, ca. 5 μg of genomic DNA was fragmented via ultrasound, followed by purification using CASpure PCR Purification Kit (Chaoshi-Bio, Shanghai, China) and end repair with poly-A on the 3’ ends. The DNA fragments were then linked to adapters, extracted at specific size after agarose gel electrophoresis and amplified by PCR to generate a sequencing library. Genomic and chroloplast DNA were sequenced using the Illumina NovaSeq platform (Novogene, Tianjin, China), yielding ca. 5 Gb 150-bp paired-end reads from a library of approximately 350-bp DNA fragments.

The BWA-MEM algorithm was performed to map chloroplast DNA reads against the complete chloroplast genome sequence of S. stolonifera (NC_037882) as implemented in BWA 0.7.17 (Li and Durbin 2009). A total of 7,204,234 paired-end reads were recovered and then assembled using SPAdes 3.13.0 (Bankevich et al. 2012). Six contigs, ranging in size from 2993 to 60,980 bp, were obtained, followed by a further scaffolding using SSPACE-basic 2.0 (Boetzer et al. 2011). The gaps between the de novo contigs, as well as genomic regions located at the junction between the two contigs were filled and verified by Sanger sequencing. The primers used were designed using Primer-BLAST (Ye et al. 2012) and are listed in Supplementary Table S1.

Genome annotation and simple sequence repeats analysis

– The genome was annotated using the program GeSeq (Tillich et al. 2017). The predicted annotations were verified using BLAST search against cp genomes of other closely related species to validate positions of questionable start and stop codons. The circular cp genome map was drawn using OGDRAW (Greiner et al. 2019). The complete chloroplast genome of S. sinomontana was deposited into GenBank, with the accession of MN104589.

Simple sequence repeats (SSRs) were detected using MISA (Thiel et al. 2003) with minimal repeat numbers of 10, 5, 4, 3, 3 and 3 for mono-, di-, tri-, tetra-, penta- and hexa-nucleotides, respectively.

Comparison with other Saxifragaceae chloroplast genomes

– Sequence length of complete chloroplast genomes, as well as each individual regions were compared among S. sinomontana and six other Saxifragaceae species, S. stolonifera (NC_037882), Bergenia scopulosa T. P. Wang (NC_036061), Chrysosplenium aureobracteatum Y. I. Kim & Y. D. Kim (NC_039740), Heuchera parviflora Bartling var. saurensis R. A. Folk (KR478645), Mukdenia rossii (Oliv.) Koidz. (NC_037495), Oresitrophe rupifraga Bunge (NC_037514). The percentage of GC-content was calculated using MEGA version 7.0.26 (Kumar et al. 2016). The percentage of sequence identity among complete chloroplast genomes of the seven Saxifragaceae species was comparatively analyzed and plotted using the program mVISTA (Frazer et al. 2004), with alignment algorithm of LAGAN (Brudno et al. 2003), a cut-off of 70% identity and annotation of S. sinomontana as reference. Comparison of junction sites of LSC, IR and SSC regions among these Saxifragaceae chloroplast genomes was implemented using the program IRscope (Amiryousefi et al. 2018).

Phylogenetic analysis

– To estimate the putative performance of chloroplast genomes on phylogeny resolution of Saxifragaceae species, phylogenetic relationships of the above-mentioned species were inferred, using Ribes fasciculatum Siebold & Zucc. var. chinense Maxim. (MH191388) and Itea chinensis Hook. & Arn. (NC_037884) as outgroups. A set of 66 protein-coding genes (psaA, psaB, psaC, psaI, psaJ, psbA, psbC, psbD, psbE, psbF, psbH, psbJ, psbK, psbL, psbM, psbN, psbT, psbZ, petA, petB, petD, petG, petL, petN, atpA, atpB, atpE, atpF, atpH, atpI, rbcL, ndhA, ndhB, ndhC, ndhD, ndhE, ndhF, ndhG, ndhH, ndhI, ndhJ, ndhK, rpl2, rpl14, rpl16, rpl20, rpl22, rpl23, rpl32, rpl33, rpl36, rps2, rps3, rps4, rps7, rps8, rps11, rps14, rps15, rps16, rps18, rps19, rpoA, rpoB, rpoC1 and rpoC2) shared by all of the species were extracted from all the selected chloroplast genomes. Individual PCGs were concatenated into a single matrix and then aligned using MEGA version 7.0.26 (Kumar et al. 2016). Phylogenetic analysis was performed by mean of Maximum Likelihood (ML) using RAxML version 8.1.21 (Stamatakis 2014) implemented in raxmlGUI version 1.5b2 (Silvestros and Michalak 2012). The best-fit substitution model of GTR + I+G was selected by the Akaike Information Criteria (AIC) using jModelTest version 2.1.4 (Darriba et al. 2012). Bootstrap support assessment was performed with 1000 replications.

3 Results

Genome content and structure of S. sinomontana

– The chloroplast genome of S. sinomontana is a closed circular molecule of 147,240 bp with a typical quadripartite structure, including the LSC of 79,310 bp and SSC of 16,874 bp separated by a pair of IRs of 25,528 bp each (Fig. 1; Table 3). The GC-contents of the LSC, SSC, IR regions and the overall chloroplast genome of S. sinomontana are 36.2%, 32.0%, 42.9% and 38.0%, respectively (Table 3). The comparative analysis revealed that the chloroplast genome size of S. sinomontana is the smallest among all the seven Saxifragaceae chloroplast genomes, with 3826 bp shorter than its congeneric species, S. stolonifera (Table 3). Although the sizes of the overall genome, LSC, SSC and IR regions are different to some extent, the GC-contents of the complete chloroplast genomes and of each individual regions are similar among the seven Saxifragaceae species (Table 3).

Fig. 1
figure 1

Chloroplast genome map of Saxifraga sinomontana. Genes drawn inside of the outer circle are transcribed clockwise, and those outside are transcribed counterclockwise. Genes belonging to different functional groups are shown in different colors. The innermost darker gray corresponds to GC-content while the lighter gray corresponds to AT content

The chloroplast genome of S. sinomontana contains 79 unique protein-coding genes, seven of which are duplicated (ndhB, rpl2, rpl23, rps7 and ycf2) or partially duplicated (rps19, ycf1) in the IR. Four rRNA genes were identified in this genome, all of which are completely duplicated in the IR regions. In addition, 30 unique tRNA genes were annotated, among which, seven are duplicated. In total, the S. sinomontana chloroplast genome contains 131 genes (Table 1). Among them, 17 genes contain introns, including 11 protein-coding genes and six tRNA genes, almost all of which are single-intron genes except for ycf3 and clpP. Like most other angiosperms (Nie et al. 2012; Hu et al. 2015; Wang et al. 2017; Yan et al. 2019), the rps12 gene is trans-spliced, with its 5’-end exon located in the LSC region and 3’-end exon duplicated in the IRs. In general, both the content and order of genes present in S. sinomontana chloroplast genome are similar to those present in other Saxifragaceae species.

Table 1 Genes present in Saxifraga sinomontana chloroplast genome

Simple sequence repeats analysis

– Sixty-one SSRs, ranging from 10 to 17 bp in length, were detected in the S. sinomontana chloroplast genome (Table 2). The total length of the 61 SSRs found in S. sinomontana is 713 bp with an AT content of 95.1%, much higher than the rest of the genome. The mononucleotide repeat is the most abundant SSR type (41 out of 61) in S. sinomontana chloroplast genome, representing 67.2% of the total SSR loci. Among the 41 mononucleotide repeats, 40 are A or T types, whereas the number of di-, tri-, tetra-, and penta-nucleotide repeats detected in S. sinomontana chloroplast genome are 10, five, four and one, respectively. No hexanucleotide repeats were detected in this chloroplast genome. Among the 61 SSRs, 35 are located in the intergenic regions, 11 in introns, and 15 in protein-coding genes, including ycf1, rpoB, atpB, ndhF, rpoC1, rps14, psbC, rpoC2, rpl22 genes (Table 2).

Table 2 Simple sequence repeat in Saxifraga sinomontana chloroplast genome

Chloroplast genome comparisons

– The overall sequence identity of seven chloroplast genomes of Saxifragaceae species was plotted using mVISTA with the annotation of S. sinomontana chloroplast genome as reference (Fig. 2). The results showed that the Saxifragaceae chloroplast genomes exhibited a high level of sequence synteny, suggesting a conserved evolutionary pattern. In addition, the IR regions are less divergent compared to the LSC and SSC regions. The most divergent coding regions in the seven chloroplast genomes are matK, ndhK, accD, cemA, rpoA, rps19, ndhF, ccsA, ndhD, ycf1. The most divergent regions among the seven chloroplast genomes are located in the intergenic spacers, including trnK-UUU-rps16, rps16-trnQ-UUG, trnS-GCU-trnG-GCC, atpH-atpI, trnE-UUC-trnT-GGU, trnT-UGU-trnL-UAA, ndhC-trnV-UAC, petA-petJ, psbE-petL, ndhF-rpl32, rpl32-trnL-UAG, trnH-GUG-psbA (Fig. 2).

Fig. 2
figure 2

Percentage of sequence identity between chloroplast genomes of Saxifraga sinomontana and other six Saxifragaceae species using mVISTA program. Gray arrows on the top line show transcriptional direction. The y-axis represents average percent identity between sequences of S. sinomontana and other six Saxifragaceae chloroplast genomes. The x-axis represents the coordinate in the chloroplast genome using S. sinomontana as reference. Genome regions are color coded as exon, tRNA, rRNA, and conserved non-coding sequences (CNS)

In this study, the LSC/IRB/SSC/IRA boundaries and the adjacent genes were compared across the seven Saxifragaceae chloroplast genomes (Fig. 3). The B. scopulosa chloroplast genome showed dramatic variation in boundaries of IRB/SSC and SSC/IRA compared to the other chloroplast genomes, with its ndhF and duplicated ycf1 genes completely located in the SSC region, resulting in the largest SSC of 21,920 bp but smallest IR of 23,811 bp among the seven Saxifragaceae chloroplast genomes (Table 3; Fig. 3). For the remaining six chloroplast genomes, expansion/contraction of IRs were also detected: the LSC/IRB border located within the coding region of rps19 or rpl22, or within the intergenic spacer between rps19 and rpl2; the IRB/SSC junction fell into the ycf1 pseudogene and/or ndhF gene; the SSC/IRA border located within the ycf1 gene, but with different extensions (Fig. 3). However, the IRA/LSC junction showed a complex pattern among the remaining six chloroplast genomes: (1) within the coding region of trnH-GUG (S. sinomontana); (2) within the intergenic spacer between partially or completely duplicated rps19 gene and trnH-GUG (S. stolonifera and H. parviflora var. saurensis); (3) within the intergenic spacer of rpl2-trnH-GUG without a duplication of rps19 between them (C. aureobracteatum, M. rossii and O. rupifraga).

Fig. 3
figure 3

Comparison of the junction sites of LSC, IR and SSC regions among seven Saxifragaceae chloroplast genomes. Selected genes are indicated by boxes along the genome. Genes above the genome lines indicate their transcriptions in forward direction, while below in reverse direction. JLB junction between LSC and IRB, JSB junction between SSC and IRB, JSA junction between SSC and IRA, JLA junction between LSC and IRA

Table 3 Length and GC-content of the complete plastome, LSC, SSC and IR regions in seven Saxifragaceae species

Phylogenetic analysis

– Aset of 66 protein-coding genes sharing among the seven Saxifragaceae chloroplast genomes were extracted, and phylogenetic relationships were reconstructed with R. fasciculatum var. chinense and I. chinensis as outgroups. The ML tree topology was consistent with previous studies, which confirmed Saxifraga sensu stricto as a monophyletic clade (Deng et al. 2015; Tkach et al. 2015). The ML bootstrap values based on the 66 shared genes were fairly high, and all but one node presented a bootstrap value of 100% (Fig. 4).

Fig. 4
figure 4

The maximum-likelihood phylogenetic tree of seven Saxifragaceae species using 66 protein-coding genes of chloroplast genomes. Numbers above nodes are bootstrap support values

4 Discussion

The genome structure of S. sinomontana is consistent with that of most terrestrial plants (Palmer 1985), and the size (147,240 bp) falls well into the range of 120–160 kb of angiosperm chloroplast genomes (Palmer 1985). In general, gene content and genome organization of angiosperm chloroplast genomes are highly conserved compared to nuclear and mitochondrial genomes (Wicke et al. 2011). However, extensive gene losses and large inversions have been detected in several lineages, such as Gentianaceae (Fu et al. 2016; Sun et al. 2018), Asteraceae (Jansen and Palmer 1987; Liu et al. 2013; Walker et al. 2014), Leguminosae (Doyle et al. 1992, 1996). In the present study, no extensive gene losses and large inversions were detected among Saxifragaceae chloroplast genomes. In addition, comparing the complete chloroplast genome of S. sinomontana with that of other six Saxifragaceae species, we found that the size of S. sinomontana chloroplast genome is the smallest among the compared Saxifragaceae species, but the organization and gene content is highly similar.

Chloroplast SSRs usually exhibit high levels of variations and are widely used in polymorphism investigations, population genetics and phylogenetic analyses (Powell et al. 1995; Provan et al. 1999, 2001; Flannery et al. 2006; Xue et al. 2012; Li et al. 2019). For S. sinomontana, the number of cpSSRs (sixty-one) is moderate compared with other species of angiosperms (Liu et al. 2013; Fu et al. 2016; Yan et al. 2019). According to this and previous studies (Liu et al. 2013; Fu et al. 2016; Yan et al. 2019), extremely high AT contents of cpSSR sequences seem to be a common phenomenon in higher plant chloroplast genomes. In S. sinomontana, the mononucleotide repeat types are consistent with previous finding that short polyadenine or polythymine repeats are main contributors to SSRs in chloroplast genome (Kuang et al. 2011), and the distribution pattern of cpSSRs between coding and non-coding regions is consistent with most angiosperm species (Nie et al. 2012).

Although chloroplast genomes are considered to be rather conservative among angiosperm species, regions with high sequence polymorphisms are frequently observed even among closely related species (Kim and Jansen 1995). These highly divergent regions are widely used in plant phylogenetics, population genetics and DNA barcoding studies. Similar to other angiosperms (Nie et al. 2012; Liu et al. 2013; Chi et al. 2018; Yan et al. 2019), the coding regions of seven Saxifragaceae chloroplast genomes are more conservative than the non-coding regions. Among the ten most divergent coding regions in this study, matK has been considered as core universal DNA barcodes in many species (Li et al. 2019), and ycf1 has recently been widely applied in plant phylogeny and DNA barcoding studies (Dong et al. 2015; Yang et al. 2017). Besides, accD, rps19, ccsA and ndhF have also been proved to exhibit large divergence among various plant lineages and can be applied in phylogenetic studies (Ni et al. 2016; Ivanova et al. 2017). The boundaries of IRs and LSC/SSC differs among various plant species (Nie et al. 2012; Ni et al. 2016; Yan et al. 2019; Li et al. 2019), and expansion/extraction of the IR regions often lead to size variation of chloroplast genomes (Wang et al. 2008). Among the Saxifragaceae family, although various variations in the contraction/expansion of IRs have been detected, the IR sequences are not consistent with the total size of chloroplast genomes (Nie et al. 2012).

A number of phylogenetic studies have been conducted to clarify relationships within the family Saxifragaceae (Soltis et al. 2001, 2013; Xiang et al. 2012; DeChaine et al. 2013; Zhu et al. 2013; Deng et al. 2015), or within the genus Saxifraga (Zhang et al. 2008; Tkach et al. 2015; Gao et al. 2015). However, infrasectional relationships of the most species-rich section Ciliatae are still not well resolved (Zhang et al. 2008; Tkach et al. 2015; Gao et al. 2015), mostly due to the lack of resolutions within the recent divergent lineage of Saxifraga sect. Ciliatae subsect. Hirculoideae, in which informatively polymorphic sites are revealed to be limited (Gao et al. 2015). Chloroplast genomes comprise abundant phylogenetic information, which could have the potential to significantly advance our ability to resolve phylogenetic relationships in this lineage. The result of phylogenetic analysis revealed that seven Saxifragaceae species based on 66 protein-coding genes received high bootstrap support values for nearly all identified nodes, suggesting a promising opportunity to resolve infrasectional relationships of the most species-rich section Ciliatae. This is the first chloroplast genome sequenced in S. sect. Ciliatae and also the second in the genus Saxifraga. More taxa, especially those of S. sect. Ciliatae subsect. Hirculoideae should be included in the chloroplast genome comparative analysis to realize the full potential of chloroplast genomes in phylogenetic analysis of S. sect. Ciliatae.