Background

The nuclear and chloroplast genomes are the major targets for plant species authentication and phylogenetic studies. Since the rate of evolution varies across each genome, different DNA regions may be selected to reveal different taxonomic levels. The criteria for a useful DNA marker for authentication are: (1) high interspecific divergence; (2) low intraspecific divergence; (3) short PCR product of around 1 kb; and (4) availability of universal primers for amplification [1, 2]. The Consortium for the Barcode of Life (CBOL) set up a standardized sampling method and experimental protocol to analyze agreed-upon “DNA barcodes” [3]. This universal identification system is called DNA barcoding. Recently, the CBOL Plant Working Group recommended that rbcL and matK should be used as the land plant barcodes [4]. The former offers high universality and good discrimination power, while the latter has higher resolution than other loci. However, it is known that the differentiation powers of rbcL and matK may not be sufficient for closely related species [5]. Indeed, plenty of land plants are identified by other DNA regions as markers.

The internal transcribed spacer (ITS) of the nuclear ribosomal cistron consists of ITS1 and ITS 2, and has been demonstrated to be useful for phylogenic studies in many angiosperm families [6]. Recently, over 60,000 ITS sequences of plants and animals from GenBank were compared [7]. At the species level, the success rates of identification were 91.9%, 76.1%, 74.2%, 67.1%, 88.1% and 77.4% for animals, dicotyledons, monocotyledons, gymnosperms, ferns and mosses, respectively. ITS regions can be found in plants, animals and fungi, and occasionally ITS regions of fungi in medicinal materials were co-amplified, thereby making direct sequencing of the amplified DNA product unsuccessful. The non-transcribed spacer of 5S rRNA is highly variable, and some studies have illustrated that its resolving power is higher than those of the ITS sequences [8]. In the chloroplast genome, the trnH-psbA spacer is a rapidly evolving region suitable for identification at the species level [9]. Other chloroplast DNA loci, including trnL-F, have been demonstrated to be informative at the generic level [10]. In a recent study, trnL-F has also been used to separate Cardiocrinum giganteum from its variant C. giganteum var. yunnanense and their closely related species [11].

Four medicinal Gentiana species, including Gentiana manshurica Kitag., Gentiana scabra Bunge, Gentiana triflora Pall., and Gentiana rigescens Franch., are listed in the Chinese Pharmacopoeia as Gentianae Radix et Rhizoma or “Longdan” in Chinese [12]. They are common medicinal materials used for treating liver diseases [13], and hepatoprotective against acetaminophen-induced acute toxicity [14]. The first three species are mainly distributed in the northeastern part of China and called “Guanlongdan” (GL), while G. rigescens is located in the southwestern part of China and called “Jianlongdan” (JL). The genus Gentiana is divided into 12 sections in China [15]. GL and JL belong to the adjacent sections of Pneumonanthe (Section III) and Monopodiae (section IV), respectively. While different plant species may be used for the same medicinal purpose in Chinese medicine (e.g. Gentiana rhodantha Franch. is frequently used as a substitute in southwestern China), the neurotoxic Podophyllum hexandrum Royle in the family Berberidaceae with a similar morphology is deemed adulterant [16].

This study aims to evaluate the feasibility and limitations of rbcL and matK and five other commonly used DNA regions for authentication of medicinal Gentiana species and their adulterants, G. rhodantha and P. hexandrum. In particular, the sequence divergences and differentiation powers of the tested regions were determined and compared.

Methods

Authentic samples were collected from various regions of China, as identified by Dr. Hui Cao based on morphological characters. (Table 1) [17]. The voucher specimens were deposited in the Institute of Chinese Medicine, The Chinese University of Hong Kong.

Table 1 Samples studied

The rhizome of each sample (0.05 g) was ground and total DNA was extracted by a modified CTAB extraction method with a minor modification [18] that the DNA pellet was resuspended in 30 μL of water instead of 50 μL of Tris-EDTA buffer. Polymerase chain reaction was performed in a 25-μL mixture. Details of the primer sequences and the respective amplified regions are presented in Table 2. The specific PCR products were isolated from the PCR mixture by a Gel-M™ Gel Extraction System (Viogene, Taiwan). Except for 5S rRNA, the purified PCR products of the DNA barcodes were directly sequenced. The 5S rRNA PCR product was ligated into the pGEM-T Easy vector (Promega, USA) at 25°C for 2 hours. Three to four clones containing the insert were sequenced for each individual sample. A Rapid Plasmid Miniprep System (Viogene, Taiwan) was used for plasmid extraction. The purified PCR products or plasmids were sequenced using a BigDye® Terminator v3.1 Cycle Sequencing Kit (Applied Biosystems, USA). Alignment of the DNA sequences was accomplished by ClustalW using the BioEdit program [19, 20], and manual adjustment of the sequence alignment was performed if necessary. The genetic distance among samples was determined by the nucleotide model Kimura 2-parameter in MEGA 5 software [21]. All distances were calculated from pairwise global alignments, in which alignment gaps and missing data were eliminated by choosing the “pairwise deletion option”. If the minimum sequence divergence between two groups of species was larger than the maximum intraspecific sequence divergence of the two groups of species, the discrimination was considered successful. Phylogenetic trees of the seven loci were constructed by MEGA5 with the neighbor-joining (NJ) method [21]. Bootstrap analyses for 1000 replicates were performed to provide confidence estimates for the tree topologies.

Table 2 Universal primers used in this study

Results

DNA barcode sequence determination

The primers listed in Table 2 could amplify the corresponding loci in the samples, except for the trnL-F region of sample 2005-2706b (G. rhodantha) and the matK regions of samples 2005-2703b (G. triflora), 2005-2704a (G. rigescens), 2005-2704b (G. rigescens), 2005-2706a (G. rhodantha), ICM 686 (P. hexandrum) and ICM 2148 (P. hexandrum). As a result, we checked the complementarity between the primers and the available Gentiana sequences from National Center for Biotechnology information (NCBI). For matK, it was found that there are 3–4 different nucleotides between 3 F KIM f and the Gentiana sequences (Table 3). We designed a new pair of primers, matK_G SC_F (5′-TATATATTGTATTCGATACAAAC-3′) and matK_GSC_R (5′-TTCTACGAATATTGGAATTGGAA-3′), based on the conserved region of the available Gentiana and P. hexandrum sequences, which successfully amplified all of the Gentiana and P. hexandrum samples. For trnL-F, there was only one nucleotide difference near the 5′ end terminus of the primer. Since only sample 2005-2706b (G. rhodantha) could not be amplified, the failure of amplification might be caused by fragmentation of the template DNA.

Table 3 Sequence alignment among the matK forward primer 3 F Kim f and the corresponding binding sites of the Gentiana species available in NCBI

Sequence divergences of the seven DNA regions

The sizes of the seven loci (excluding the primer-binding sites) of the examined species are shown in Table 4. The sizes ranged from 239 to 940 bp, with most falling between 400 to 800 bp as the optimum range for routine PCR. The lengths of the protein-encoding genes rbcL and matK were identical across the samples, while the five intergenic spacers were found to be varied.

Table 4 Properties of the seven barcoding regions of Gentiana and P. hexandrum

To show the discriminative powers of the seven DNA regions, we compared the sequence divergence of (1) medicinal Gentiana species (G. scabra, G. manshurica, G. triflora and G. rigescens) and their adulterants (G. rhodantha and P. hexandrum); and (2) GL (G. scabra, G. manshurica and G. triflora) and JL (G. rigescens) (Table 5). When comparing the divergences between medicinal Gentiana species and their adulterants, 5S rRNA had the highest divergence values, both interspecifically and intraspecifically, while rbcL had the lowest values (Table 5). The minimum divergence values of rbcL, matK, trnH-psbA, trnL-F, rpl36-rps8, ITS and 5S rRNA between medicinal Gentiana and P. hexandrum were 0.0995, 0.3399, 0.3908, 0.3888, 0.2211, 0.4781 and 0.6154, while the maximum intraspecific divergence values were only 0.0018, 0.0042, 0.0101, 0.0026, 0.0033, 0.0058 and 0.0939, respectively. For medicinal Gentiana and the adulterant G. rhodantha, the minimum divergence values between these regions were 0.0128, 0.0597, 0.1349, 0.1562, 0.0958, 0.0862 and 0.3098, while the maximum intraspecific divergence values were 0.0018, 0.0042, 0.0101, 0.0026, 0.0033, 0.0058 and 0.0914, respectively. Since the maximum intraspecific divergences of the seven loci were lower than the interspecific divergences, all of them could be employed to discriminate between medicinal Gentiana species and their adulterants.

Table 5 Sequence divergence of the seven barcoding regions of Gentiana and P. hexandrum

The DNA sequences were significantly different in GL and JL. The minimum divergence values of rbcL, matK, trnH-psbA, trnL-F, rpl36-rps8, ITS and 5S rRNA between these two groups were 0.0109, 0.0521, 0.0780, 0.0332, 0.0392, 0.0462 and 0.4897, while the maximum intraspecific divergence values were 0.0018, 0.0042, 0.0101, 0.0026, 0.0000, 0.0043 and 0.0914, respectively. Therefore, GL and JL could be distinguished from each other using any of the seven DNA loci (Table 5). On the other hand, the genetic variability in the three GL species was extremely low for all loci. Only 5S rRNA could differentiate between G. manshurica and G. triflora, while trnL-F could distinguish G. scabra and G. triflora. Table 4 shows the selected polymorphic sites for differentiating among the three GL species. G. triflora, G. scabra and G. manshurica are genetically closely related, and possess the interchangeable medicinal applications.

To confirm the effectiveness of rbcL and matK in the identification of Gentiana species, we included all available Gentiana sequences in NCBI in the analysis of these two barcodes. In total, 14 rbcL sequences (including 10 sequences generated in this study) of 9 Gentiana species and 68 matK sequences (including 10 sequences generated in this study) of 23 Gentiana species and subspecies were aligned. For rbcL, the maximum intraspecific divergence value was 0.00215, while the minimum interspecific divergence value was 0. We found that the rbcL sequences of Gentiana andrewsii (HQ590117.1) and Gentiana pneumonathe (JN891473.1) were identical. For matK, the maximum intraspecific divergence value was 0.01032, while the minimum interspecific divergence value was 0. Twenty sequences were identical, including 12 samples of Gentiana verna (EF552088.1–EF552099.1), one sample of Gentiana schleicheri (EF552087.1), three samples of Gentiana pumila subsp. delphinensis (EF552084.1–EF552086.1) and four samples of Gentiana brachyphylla subsp. favratii (EF552080.1–EF552083.1). These results indicated that rbcL and matK could not resolve all Gentiana species well.

As shown in Figures 1, 2, 3, 4, 5, 6 and 7, the NJ trees of the seven barcodes revealed that medicinal Gentiana species were clearly differentiated from P. hexandrum. Among the Gentiana species, the three GL species were clustered together as a clade and separated from JL and G. rhodantha with high supporting bootstrap values (>70%), suggesting that the species identification among GL, JL and G. rhodantha can be well resolved by the seven DNA barcodes.

Figure 1
figure 1

K2P distance NJ tree for rbcL. A consensus NJ tree for rbcL of Gentiana and P. hexandrum assessed with 1000 bootstrap replicates was constructed by bootstrap analyses with the bootstrap values indicated at the branches (bootstrap values of less than 50 are not shown).

Figure 2
figure 2

K2P distance NJ tree for matK. A consensus NJ tree for matK of Gentiana and P. hexandrum assessed with 1000 bootstrap replicates was constructed by bootstrap analyses with the bootstrap values indicated at the branches (bootstrap values of less than 50 are not shown).

Figure 3
figure 3

K2P distance NJ tree for trnH-psbA. A consensus NJ tree for trnH-psbA of Gentiana and P. hexandrum assessed with 1000 bootstrap replicates was constructed by bootstrap analyses with the bootstrap values indicated at the branches (bootstrap values of less than 50 are not shown).

Figure 4
figure 4

K2P distance NJ tree for trnL-F. A consensus NJ tree for trnL-F of Gentiana and P. hexandrum assessed with 1000 bootstrap replicates was constructed by bootstrap analyses with the bootstrap values indicated at the branches (bootstrap values of less than 50 are not shown).

Figure 5
figure 5

K2P distance NJ tree for rpl36-rps8. A consensus NJ tree for rpl36-rps8 of Gentiana and P. hexandrum assessed with 1000 bootstrap replicates was constructed by bootstrap analyses with the bootstrap values indicated at the branches (bootstrap values of less than 50 are not shown).

Figure 6
figure 6

K2P distance NJ tree for ITS. A consensus NJ tree for ITS of Gentiana and P. hexandrum assessed with 1000 bootstrap replicates was constructed by bootstrap analyses with the bootstrap values indicated at the branches (bootstrap values of less than 50 are not shown).

Figure 7
figure 7

K2P distance NJ tree for 5S rRNA. A consensus NJ tree for 5S rRNA of Gentiana and P. hexandrum assessed with 1000 bootstrap replicates was constructed by bootstrap analyses with the bootstrap values indicated at the branches (bootstrap values of less than 50 are not shown).

Discussion

This study performed a comparative assessment of the discriminative powers of seven DNA regions for the authentication of genetically closely related medicinal Gentiana species and their adulterants. rbcL and matK are the two recommended DNA barcodes that can resolve 72% of land plants when used in combination [4]. In our study, however, rbcL provided the lowest intraspecific and interspecific divergences. There were only 6 bp that differed out of 553 bp between GL and JL. It has also been shown that rbcL is the least divergent locus among 11 DNA barcode candidates for differentiating species in Solanaceae [1].

The other CBOL-recommended barcode matK had higher sequence divergence, but was difficult to amplify by PCR. There were mismatches between the primer and the published Gentiana sequences, indicating that the recommended matK primers might not be applicable to all land plants. A recent study of medicinal plants in Southern Morocco [25] shows that the success rate of PCR amplification of matK is less than 30%. Regarding the resolving power, matK had the third-highest value for differentiating between GL and JL (Table 5). Nevertheless, it was only ranked fifth and sixth for distinguishing between medicinal Gentiana species and their adulterants P. hexandrum and G. rhodantha, respectively.

trnL-F had the longest DNA sequence among the tested loci (Table 4). A Gentiana sample could not be amplified, which was probably due to fragmentation of the DNA or other reasons. trnL-F had a high resolving power, and was the only locus capable of differentiating G. scabra from G. triflora (Table 4), suggesting trnL-F as a good locus for differentiation of the closely related Gentiana species.

The size of rpl36-rps8 was small among the seven loci (Table 4) The PCR product of P. hexandrum was about 200 bp larger than those of Gentiana. Thus, the size difference could be used as a marker to distinguish Gentiana from P. hexandrum without DNA sequencing. Like rbcL, rpl 36-rps 8 also had low interspecific and intraspecific divergences, although its ranking was slightly higher than that of rbcL. Its major drawback was the limited number of reference sequences in GenBank.

The size of the trnH-psbA region ranged from 399 to 646 bp, which was moderate among the seven DNA regions (Table 4). There was a significant size difference between Gentiana and Podophyllum. In terms of the resolving power, trnH-psbA had ranked second for differentiating GL from JL, and provided higher resolving power than matK and rbcL. This intergenic spacer also shows a good amplification success rate and discrimination power among the nine loci tested [1]. Among 19 species in seven families of angiosperms, trnH-psbA shows nearly three-fold higher divergence than other tested chloroplast regions, while the ITS region exhibits two-fold higher divergence than trnH-psbA[1].

Some studies [2628] show that nuclear ITS is an appropriate DNA marker for herbal authentication and plant phylogenetic studies. In our study, the ITS region was the third longest region across Gentiana and P. hexandrum, and the sizes differed slightly from one another (Table 4). The divergence ranking was average among the five Gentiana species, but increased to the second highest for distinguishing medicinal Gentiana and P. hexandrum (Table 5), indicating that the ITS regions among the studied Gentiana species were quite conserved.

The size of the 5S rRNA intergenic spacer regions ranged from 239 to 457 bp, which was the smallest but most varied (Table 4). Among the tested regions, only 5S rRNA could distinguish G. triflora from G. manshurica and G. scabra. Our study showed that the intraspecific divergence was high, which was probably due to the non-homogeneity of the different copies of the 5S rRNA gene spacer. It is essential to clone the amplified PCR product prior to sequencing to overcome the sequence degeneration issue.

Jiang et al.[29] established chemical profiles of Gentiana species. The four medicinal Gentiana species involved have close similarity in their chemical compositions, in that they all contain loganic acid, 6-O-B-D-glucopyranosylgentiopicroside, swertiamarin, gentiopicroside, and sweroside [29]. Compound 2-(o,m-dihydroxybenzyl)-sweroside is only found in G. rigescens[29] and can be used to differentiate GL and JL. Among the three GL species, only G. triflora contains gentiotrifloroside [29]. The chemical profiles of G. manshurica and G. scabra are nearly identical, except that the former has a higher sweroside content [29]. The chemical profiles therefore support our observations in the DNA barcode analyses.

Conclusion

All the tested loci could differentiate medicinal Gentiana species from their adulterants, and distinguish GL from JL. The two official DNA barcodes, rbcL and matK, have no significant advantages over the remaining five loci examined.