Skip to main content
Log in

The specific DNA barcodes based on chloroplast genes for species identification of Theaceae plants

  • Research Article
  • Published:
Physiology and Molecular Biology of Plants Aims and scope Submit manuscript

Abstract

More than 600 species in over 40 genera have been identified in family Theaceae worldwide. The accurate identification of Theaceae plants can ensure the market economic order, and it plays a vital role in achieving the sustainable utilization of germplasm resources. DNA barcoding, one of the most potential species identification technologies at present, has advanced in the rapid, accurate and repetitive discrimination of species. In this study, matK + ndhF + ycf1 was observed as the optimal combined candidate gene sequence of DNA barcodes by analyzing genetic information of four single chloroplast DNA sequences, including matK, rbcL, ndhF and ycf1, as well as six combined gene sequences. Subsequently, the experiments were performed on phylogenetic analysis based on genetic distance to study the phylogenetic relationship of Theaceae plants and evaluate the species identification accuracy of matK + ndhF + ycf1. Lastly, the species-specific DNA barcodes were designed by searching the variable sites (one type of single nucleotide polymorphism sites) for the accurate identification of Camellia amplexicaulis, Franklinia alatamaha, Gordonia brandegeei and Stewartia micrantha. The previous methods of screening and testing candidate gene sequences were optimized, and innovation was made in the above methods. The process of making visual DNA barcodes was standardized. Besides, DNA barcoding technology increased the accuracy of species identification and DNA barcoding was analyzed in accordance with the theories of population genetics (e.g., neutral theory of molecular evolution). The results of the study will lay a basis for the identification and protection of Theaceae species and germplasm resources.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Data availability

The data supporting the finding of this study is provided in the manuscript and its supplementary material.

Abbreviations

cpDNA:

Chloroplast DNA

C sites:

Conserved sites

V sites:

Variable sites

Pi sites:

Parsimony informative sites

S sites:

Singleton sites

SNP sites:

Single nucleotide polymorphism sites

ii:

Identical Pairs

si:

Transitionsal pairs

sv:

Transversional pairs

R value:

Transitionsal pairs/Transversional pairs 

θ:

Nucleotide substitution rate

Hd:

Haplotype diversity

S:

Segregating sites

π:

Nucleotide diversity

Fu’s Fs :

Representative value of neutrality tests

Tajima’s D :

Representative value of neutrality tests

P value:

Hypothetical probability

ML tree:

Maximum likelihood tree

NJ tree:

Neighbor joining tree

Ca :

Camellia amplexicaulis

Fa :

Franklinia alatamaha

Gb :

Gordonia brandegeei

Sm :

Stewartia micrantha

References

  • Amar MH (2020) ycf1-ndhF genes, the most promising plastid genomic barcode, sheds light on phylogeny at low taxonomic levels in Prunus persica. J Genet Eng Biotechnol 18(1):42

    Article  PubMed  PubMed Central  Google Scholar 

  • Besse P, Da Silva D, Grisoni M (2021) Plant DNA barcoding principles and limits: a case study in the genus vanilla. Methods Mol Biol 2222:131–148

    Article  CAS  PubMed  Google Scholar 

  • Bhargava M, Sharma A (2013) DNA barcoding in plants: evolution and applications of in silico approaches and resources. Mol Phylogenet Evol 67(3):631–641

    Article  CAS  PubMed  Google Scholar 

  • Chaveerach A, Tanee T, Sanubol A, Monkheang P, Sudmoon R (2016) Efficient DNA barcode regions for classifying Piper species (Piperaceae). PhytoKeys 70:1–10

    Article  Google Scholar 

  • Chen G, Sun W (2018) The role of botanical gardens in scientific research, conservation, and citizen science. Plant Divers 40(4):181–188

    Article  PubMed  PubMed Central  Google Scholar 

  • Chen C, Chen H, Zhang Y, Thomas HR, Frank MH, He Y, Xia R (2020a) TBtools: an integrative toolkit developed for interactive analyses of big biological data. Mol Plant 13(8):1194–1202

    Article  CAS  PubMed  Google Scholar 

  • Chen LP, Zheng FY, Bai J, Wang JM, Lv CY, Li X, Zhi YC, Li XJ (2020b) Comparative analysis of mitogenomes among six species of grasshoppers(Orthoptera: Acridoidea: Catantopidae) and their phylogeneticimplications in wing-type evolution. Int J Biol Macromol 159(1):1062–1072

    Article  CAS  PubMed  Google Scholar 

  • Collins RA, Cruickshank RH (2013) The seven deadly sins of DNA barcoding. Mol Ecol Resour 13(6):969–975

    CAS  PubMed  Google Scholar 

  • Cui N, Liao BS, Liang CL, Li SF, Zhang H, Xu J, Li XW, Chen SL (2020) Complete chloroplast genome of Salvia plebeia: organization, specific barcode and phylogenetic analysis. Chin J Nat Med 18(8):563–572

    PubMed  Google Scholar 

  • Delabye S, Rougerie R, Bayendi S, Andeime-Eyene M, Zakharov EV, deWaard JR, Hebert PDN, Kamgang R, Le Gall P, Lopez-Vaamonde C, Mavoungou JF, Moussavou G, Moulin N, Oslisly R, Rahola N, Sebag D, Decaëns T (2019) Characterization and comparison of poorly known moth communities through DNA barcoding in two Afrotropical environments in Gabon. Genome 62(3):96–107

    Article  CAS  PubMed  Google Scholar 

  • Duan H, Wang W, Zeng Y, Guo M, Zhou Y (2019) The screening and identification of DNA barcode sequences for Rehmannia. Sci Rep 9(1):17295

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  • Edgar RC (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 32(5):1792–1797

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Feng C, Pettersson M, Lamichhaney S, Rubin CJ, Rafati N, Casini M, Folkvord A, Andersson L (2017) Moderate nucleotide diversity in the Atlantic herring is associated with a low mutation rate. Elife 6:e23907

    Article  PubMed  PubMed Central  Google Scholar 

  • Fontaine B, Achterberg K, Alonso-Zarazaga MA (2012) New species in the old world: Europe as a frontier in biodiversity exploration, a test bed for 21st century taxonomy. PLoS ONE 7(5):e36881

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Fu YX (1997) Statistical tests of neutrality of mutations against population growth. Hitchhiking Backgr Sel Genet 147(2):915–925

    CAS  Google Scholar 

  • Ganie SA, Molla KA, Henry RJ, Bhat KV, Mondal TK (2019) Advances in understanding salt tolerance in rice. Theor Appl Genet 132(4):851–870

    Article  CAS  PubMed  Google Scholar 

  • García-Robledo C, Erickson DL, Staines CL, Erwin TL, Kress WJ (2013) Tropical plant-herbivore networks: reconstructing species interactions using DNA barcodes. PLoS ONE 8(1):e52967

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  • Gogoi B, Wann SB, Saikia SP (2020) DNA barcodes for delineating Clerodendrum species of North East India. Sci Rep 10(1):13490

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Gong L, Zhang D, Ding X, Huang J, Guan W, Qiu X, Huang Z (2021) DNA barcode reference library construction and genetic diversity and structure analysis of Amomum villosum Lour. (Zingiberaceae) populations in Guangdong Province. PeerJ 9:e12325

    Article  PubMed  PubMed Central  Google Scholar 

  • Gonzalez MA, Baraloto C, Engel J, Mori SA, Pétronelli P, Riéra B, Roger A, Thébaud C, Chave J (2009) Identification of Amazonian trees with DNA barcodes. PLoS ONE 4(10):e7483

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  • Hall BG (2013) Building phylogenetic trees from molecular data with MEGA. Mol Biol Evol 30(5):1229–1235

    Article  CAS  PubMed  Google Scholar 

  • Hebert PDN, Cywinska A, Ball SL, deWaard JR (2003) Biological identifications through DNA barcodes. Proc R Soc Lond B Biol Sci 270:313–321

    Article  CAS  Google Scholar 

  • Hollingsworth PM (2011) Refining the DNA barcode for land plants. Proc Natl Acad Sci USA 108(49):19451–19452

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Kim SH, Cho CH, Yang M, Kim SC (2017) The complete chloroplast genome sequence of the Japanese Camellia (Camellia japonica L). Mitochondrial DNA B Resour 2(2):583–584

    Article  PubMed  PubMed Central  Google Scholar 

  • Koch W, Zagórska J, Marzec Z, Kukula-Koch W (2019) Applications of tea (Camellia sinensis) and its active constituents in cosmetics. Molecules 24(23):4277

    Article  CAS  PubMed Central  Google Scholar 

  • Korotkova N, Nauheimer L, Ter-Voskanyan H, Allgaier M, Borsch T (2014) Variability among the most rapidly evolving plastid genomic regions is lineage-specific: implications of pairwise genome comparisons in Pyrus (Rosaceae) and other angiosperms for marker choice. PLoS ONE 9(11):e112998

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  • Kroymann J, de Groot GA, During HJ, Maas JW, Schneider H, Vogel JC, Erkens RH (2011) Use of rbcL and trnL-F as a two-locus DNA barcode for identification of NW-European ferns: an ecological perspective. PLoS ONE 6(1):e16371

    Article  CAS  Google Scholar 

  • Letunic I, Bork P (2021) Interactive Tree Of Life (iTOL) v5: an online tool for phylogenetic tree display and annotation. Nucleic Acids Res 49(W1):W293–W296

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Li X, Yang Y, Henry RJ, Rossetto M, Wang Y, Chen S (2015) Plant DNA barcoding: from gene to genome. Biol Rev Camb Philos Soc 90(1):157–166

    Article  PubMed  Google Scholar 

  • Li W, Zhang C, Guo X, Liu Q, Wang K (2019) Complete chloroplast genome of Camellia japonica genome structures, comparative and phylogenetic analysis. PLoS ONE 14(5):e0216645

    Article  PubMed  PubMed Central  Google Scholar 

  • Li H, Xiao W, Tong T, Li Y, Zhang M, Lin X, Zou X, Wu Q, Guo X (2021) The specific DNA barcodes based on chloroplast genes for species identification of Orchidaceae plants. Sci Rep 11(1):1424

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Liu HL, Zeng YT, Zhao X, Ye YL, Wang B, Tong HR (2020) Monitoring the authenticity of pu’er tea via chemometric analysis of multielements and stable isotopes. Food Res Int 136:109483

    Article  CAS  PubMed  Google Scholar 

  • Lu H, Jiang W, Ghiassi M, Lee S, Nitin M (2012) Classification of Camellia (Theaceae) species using leaf architecture variations and pattern recognition techniques. PLoS ONE 7(1):e29704

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Luna I, Ochoterena H (2004) Phylogenetic relationships of the genera of Theaceae based on morphology. Cladistics 20(3):223–270

    Article  PubMed  Google Scholar 

  • Lv ZY, Zhang JW, Chen JT, Li ZM, Sun H (2020) The complete chloroplast genome of Soroseris umbrella (Asteraceae). Mitochondrial DNA B 5:637–638

    Article  Google Scholar 

  • Meng XH, Li N, Zhu HT, Wang D, Yang CR, Zhang YJ (2019) Plant resources, chemical constituents, and bioactivities of tea plants from the genus camellia section Thea. J Agric Food Chem 67(19):5318–5349

    Article  CAS  PubMed  Google Scholar 

  • Neininger K, Marschall T, Helms V (2019) SNP and indel frequencies at transcription start sites and at canonical and alternative translation initiation sites in the human genome. PLoS ONE 14(4):e0214816

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Newmaster SG, Ragupathy S, Janovec J (2009) A botanical renaissance: state-of-the-art DNA barcoding facilitates an Automated Identification Technology system for plants. Int J Comput Appl Technol 35(1):50–60

    Article  Google Scholar 

  • Niu S, Song Q, Koiwa H, Qiao D, Zhao D, Chen Z, Liu X, Wen X (2019) Genetic diversity, linkage disequilibrium, and population structure analysis of the tea plant (Camellia sinensis) from an origin center, Guizhou plateau, using genome-wide SNPs developed by genotyping-by-sequencing. BMC Plant Biol 19(1):328

    Article  PubMed  PubMed Central  Google Scholar 

  • Presgraves DC (2018) Evaluating genomic signatures of “the large X-effect” during complex speciation. Mol Ecol 27(19):3822–3830

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Ronquist F, Teslenko M, van der Mark P, Ayres DL, Darling A, Höhna S, Larget B, Liu L, Suchard MA, Huelsenbeck JP (2012) MrBayes 3.2: efficient Bayesian phylogenetic inference and model choice across a large model space. Syst Biol 61(3):539–542

    Article  PubMed  PubMed Central  Google Scholar 

  • Sheth BP, Thaker VS (2017) DNA barcoding and traditional taxonomy: an integrated approach for biodiversity conservation. Genome 60(7):618–628

    Article  CAS  PubMed  Google Scholar 

  • Skvarla M, Kramer M, Owen CL, Miller GL (2020) Reexamination of Rhopalosiphum (Hemiptera: Aphididae) using linear discriminant analysis to determine the validity of synonymized species, with some new synonymies and distribution data. Biodivers Data 8:e49102

    Article  Google Scholar 

  • Smidt EDC, Páez MZ, Vieira LDN, Viruel J, De Baura VA, Balsanelli E, De Souza EM, Chase MW (2020) Characterization of sequence variability hotspots in Cranichideae plastomes (Orchidaceae, Orchidoideae). PLoS ONE 15:e0227991

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Stoeckle MY, Gamble CC, Kirpekar R, Young G, Ahmed S, Little DP (2011) Commercial teas highlight plant DNA barcode identification successes and obstacles. Sci Rep 1:42

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  • Tamura K, Stecher G, Kumar S (2021) MEGA11: molecular evolutionary genetics analysis version 11. Mol Biol Evol 38(7):3022–3027

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Trávníček P, Čertner M, Ponert J, Chumová Z, Jersáková J, Suda J (2019) Diversity in genome size and GC content shows adaptive potential in orchids and is closely linked to partial endoreplication, plant life-history traits and climatic conditions. New Phytol 224:1642–1656

    Article  PubMed  CAS  Google Scholar 

  • Tung NH, Ding Y, Choi EM, Minh CV, Kim YH (2009) New neolignan component from Camellia amplexicaulis and effects on osteoblast differentiation. Chem Pharm Bull (tokyo) 57(1):65–68

    Article  CAS  Google Scholar 

  • Wang M, Zhang Y (2018) Adulteration detection of tea samples based on plant rbcL gene sequencing. Sheng Wu Gong Cheng Xue Bao 34(2):275–281

    CAS  PubMed  Google Scholar 

  • Wang Y, Yang Y, Wei C, Wan X, Thompson HJ (2016) Principles of biomedical agriculture applied to the plant family Theaceae to identify novel interventions for cancer prevention and control. J Agric Food Chem 64(14):2809–2814

    Article  CAS  PubMed  Google Scholar 

  • Wang J, Tang X, Chu Q, Zhang M, Zhang Y, Xu B (2022) Characterization of the volatile compounds in Camellia oleifera seed oil from different geographic origins. Molecules 27(1):308

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Xie L, Zhao J, Liu R (2021) The complete chloroplast genome of Pseudognaphalium affine (D.Don) Anderb. (Asteraceae). Mitochondrial DNA B Resour 6(11):3276–3277

    Article  PubMed  PubMed Central  Google Scholar 

  • Xu Y, Liu Y, Jia X (2021) Complete chloroplast genome of a cultivated oil camellia species, Camellia gigantocarpa. Mitochondrial DNA B Resour 7(1):43–45

    Article  PubMed  PubMed Central  Google Scholar 

  • Yang M, Xie F, Li J, Zhang Y, Li X, Yin H, Li J (2021) The complete chloroplast genome of Camellia fluviatilis (Theaceae), a wild oil-Camellia species. Mitochondrial DNA B Resour 6(12):3511–3512

    Article  PubMed  PubMed Central  Google Scholar 

  • Yu N, Gu H, Wei Y, Zhu N, Wang Y, Zhang H, Zhu Y, Zhang X, Ma C, Sun A (2016) Suitable DNA barcoding for identification and supervision of Piper kadsura in Chinese medicine markets. Molecules 21(9):1221

    Article  PubMed Central  CAS  Google Scholar 

  • Yu XQ, Drew BT, Yang JB, Gao LM, Li DZ (2017) Comparative chloroplast genomes of eleven Schima (Theaceae) species: Insights into DNA barcoding and phylogeny. PLoS ONE 12(6):e0178026

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  • Zhang L, Wang Y, Chen Q, Luo Y, Zhang Y, Tang HR, Wang XR (2015) Phylogeny of Rubus in China based on ndhF sequences. Acta Horticult Sin 42(1):19–30

    Google Scholar 

  • Zhang Y, Meng Q, Wang Y, Zhang X, Wang W (2020) Climate change-induced migration patterns and extinction risks of Theaceae species in China. Ecol Evol 10(10):4352–4361

    Article  PubMed  PubMed Central  Google Scholar 

  • Zhang M, Tang YW, Xu Y, Takahiro Y, Shao Y, Wang YG, Song ZP, Yang J, Zhang WJ (2021) Concerted and birth-and-death evolution of 26S ribosomal DNA in Camellia L. Ann Bot 127(1):63–73

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgements

The authors are grateful Dr. Shuai Hu, School of life sciences, Tsinghua University for critical review of this manuscript.

Funding

Authors would like to acknowledge the funding received from Key Research & Development Project of Hunan Provincial Department of Science and Technology (2019NK2081), National Natural Science Foundation of China (31872866) and National Key Research and Development Program of China (2017YFF0210301) to carry out this assignment.

Author information

Authors and Affiliations

Authors

Contributions

XHG and YLL designed the research. SJ, FLC, PQ, HX, GP and YLL collected and analyzed data. SJ, FLC, YLL and XHG wrote the main manuscript text. SJ prepared all figures and tables. All authors read and approved the manuscript.

Corresponding authors

Correspondence to Yongliang Li or Xinhong Guo.

Ethics declarations

Conflict of interest

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Fig. S1

The ML tree of Theaceae plants coming from matK + rbcL + ycf1 sequences based on the K2P model. The bootstraps of tree branches were all greater than 75%. Triangle marks (color in red) on branches represent bootstrap values; the bigger the triangles, the higher the bootstrap values would be. At the same time, displaying branch lengths on the main branches, namely the length of evolutionary distance, which retained four significant digits. (TIF 2747 kb)

Fig. S2

The Bayes tree of Theaceae plants coming from matK + rbcL + ycf1 sequences based on the K2P model. The bootstraps of tree branches were all greater than 75%. Triangle marks (color in red) on branches represent bootstrap values; the bigger the triangles, the higher the bootstrap values would be. At the same time, displaying branch lengths on the main branches, namely the length of evolutionary distance, which retained four significant digits. (TIF 3597 kb)

Fig. S3

Combinatorial super DNA barcodes employed as substitute sequences including one-dimensional and two-dimensional DNA barcodes of Theaceae plants based on matK, ndhF and ycf1. In one-dimensional DNA barcodes, base A, T, C, G in green, red, blue and black respectively. (TIF 6484 kb)

Table S1

Accession numbers, version numbers, accepted names, synonyms and definition of matK (single gene sequence) in Theaceae plants on the NCBI online database. (XLSX 19 kb)

Table S2

Accession numbers, version numbers, accepted names, synonyms and definition of rbcL (single gene sequence) in Theaceae plants on the NCBI online database. (XLSX 17 kb)

Table S3

Accession numbers, version numbers, accepted names, synonyms and definition of ndhF (single gene sequence) in Theaceae plants on the NCBI online database. (XLSX 19 kb)

Table S4

Accession numbers, version numbers, accepted names, synonyms and definition of ycf1 (single gene sequence) in Theaceae plants on the NCBI online database. (XLSX 17 kb)

Table S5

Accession numbers, version numbers, accepted names, synonyms and definition of matK + rbcL (combined gene sequence) in Theaceae plants on the NCBI online database. (XLSX 17 kb)

Table S6

Accession numbers, version numbers, accepted names, synonyms and definition of matK + ndhF (combined gene sequence) in Theaceae plants on the NCBI online database. (XLSX 16 kb)

Table S7

Accession numbers, version numbers, accepted names, synonyms and definition of matK + ycf1 (combined gene sequence) in Theaceae plants on the NCBI online database. (XLSX 16 kb)

Table S8

Accession numbers, version numbers, accepted names, synonyms and definition of rbcL + ndhF (combined gene sequence) in Theaceae plants on the NCBI online database. (XLSX 16 kb)

Table S9

Accession numbers, version numbers, accepted names, synonyms and definition of rbcL + ycf1 (combined gene sequence) in Theaceae plants on the NCBI online database. (XLSX 16 kb)

Table S10

Accession numbers, version numbers, accepted names, synonyms and definition of ndhF + ycf1 (combined gene sequence) in Theaceae plants on the NCBI online database. (XLSX 16 kb)

Table S11

Average AT and GC content at different coding positions of codons in Theaceae plants. (DOCX 17 kb)

Table S12

The mean number of identical pairs, base transition and transversion at different coding positions of codons and ratio of transitionsal pairs to transversional pairs (R value) of Theaceae plants. (DOCX 21 kb)

Table S13

Accession numbers, version numbers, accepted names, synonyms and definition of matK + ndhF + ycf1 (combined gene sequence) in Theaceae plants on the NCBI online database (XLSX 15 kb)

Table S14

NCBI Blast output of alternative candidate gene sequence segments. (DOCX 18 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jiang, S., Chen, F., Qin, P. et al. The specific DNA barcodes based on chloroplast genes for species identification of Theaceae plants. Physiol Mol Biol Plants 28, 837–848 (2022). https://doi.org/10.1007/s12298-022-01175-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12298-022-01175-7

Keywords

Navigation