Abstract
More than 600 species in over 40 genera have been identified in family Theaceae worldwide. The accurate identification of Theaceae plants can ensure the market economic order, and it plays a vital role in achieving the sustainable utilization of germplasm resources. DNA barcoding, one of the most potential species identification technologies at present, has advanced in the rapid, accurate and repetitive discrimination of species. In this study, matK + ndhF + ycf1 was observed as the optimal combined candidate gene sequence of DNA barcodes by analyzing genetic information of four single chloroplast DNA sequences, including matK, rbcL, ndhF and ycf1, as well as six combined gene sequences. Subsequently, the experiments were performed on phylogenetic analysis based on genetic distance to study the phylogenetic relationship of Theaceae plants and evaluate the species identification accuracy of matK + ndhF + ycf1. Lastly, the species-specific DNA barcodes were designed by searching the variable sites (one type of single nucleotide polymorphism sites) for the accurate identification of Camellia amplexicaulis, Franklinia alatamaha, Gordonia brandegeei and Stewartia micrantha. The previous methods of screening and testing candidate gene sequences were optimized, and innovation was made in the above methods. The process of making visual DNA barcodes was standardized. Besides, DNA barcoding technology increased the accuracy of species identification and DNA barcoding was analyzed in accordance with the theories of population genetics (e.g., neutral theory of molecular evolution). The results of the study will lay a basis for the identification and protection of Theaceae species and germplasm resources.
Similar content being viewed by others
Data availability
The data supporting the finding of this study is provided in the manuscript and its supplementary material.
Abbreviations
- cpDNA:
-
Chloroplast DNA
- C sites:
-
Conserved sites
- V sites:
-
Variable sites
- Pi sites:
-
Parsimony informative sites
- S sites:
-
Singleton sites
- SNP sites:
-
Single nucleotide polymorphism sites
- ii:
-
Identical Pairs
- si:
-
Transitionsal pairs
- sv:
-
Transversional pairs
- R value:
-
Transitionsal pairs/Transversional pairs
- θ:
-
Nucleotide substitution rate
- Hd:
-
Haplotype diversity
- S:
-
Segregating sites
- π:
-
Nucleotide diversity
- Fu’s Fs :
-
Representative value of neutrality tests
- Tajima’s D :
-
Representative value of neutrality tests
- P value:
-
Hypothetical probability
- ML tree:
-
Maximum likelihood tree
- NJ tree:
-
Neighbor joining tree
- Ca :
-
Camellia amplexicaulis
- Fa :
-
Franklinia alatamaha
- Gb :
-
Gordonia brandegeei
- Sm :
-
Stewartia micrantha
References
Amar MH (2020) ycf1-ndhF genes, the most promising plastid genomic barcode, sheds light on phylogeny at low taxonomic levels in Prunus persica. J Genet Eng Biotechnol 18(1):42
Besse P, Da Silva D, Grisoni M (2021) Plant DNA barcoding principles and limits: a case study in the genus vanilla. Methods Mol Biol 2222:131–148
Bhargava M, Sharma A (2013) DNA barcoding in plants: evolution and applications of in silico approaches and resources. Mol Phylogenet Evol 67(3):631–641
Chaveerach A, Tanee T, Sanubol A, Monkheang P, Sudmoon R (2016) Efficient DNA barcode regions for classifying Piper species (Piperaceae). PhytoKeys 70:1–10
Chen G, Sun W (2018) The role of botanical gardens in scientific research, conservation, and citizen science. Plant Divers 40(4):181–188
Chen C, Chen H, Zhang Y, Thomas HR, Frank MH, He Y, Xia R (2020a) TBtools: an integrative toolkit developed for interactive analyses of big biological data. Mol Plant 13(8):1194–1202
Chen LP, Zheng FY, Bai J, Wang JM, Lv CY, Li X, Zhi YC, Li XJ (2020b) Comparative analysis of mitogenomes among six species of grasshoppers(Orthoptera: Acridoidea: Catantopidae) and their phylogeneticimplications in wing-type evolution. Int J Biol Macromol 159(1):1062–1072
Collins RA, Cruickshank RH (2013) The seven deadly sins of DNA barcoding. Mol Ecol Resour 13(6):969–975
Cui N, Liao BS, Liang CL, Li SF, Zhang H, Xu J, Li XW, Chen SL (2020) Complete chloroplast genome of Salvia plebeia: organization, specific barcode and phylogenetic analysis. Chin J Nat Med 18(8):563–572
Delabye S, Rougerie R, Bayendi S, Andeime-Eyene M, Zakharov EV, deWaard JR, Hebert PDN, Kamgang R, Le Gall P, Lopez-Vaamonde C, Mavoungou JF, Moussavou G, Moulin N, Oslisly R, Rahola N, Sebag D, Decaëns T (2019) Characterization and comparison of poorly known moth communities through DNA barcoding in two Afrotropical environments in Gabon. Genome 62(3):96–107
Duan H, Wang W, Zeng Y, Guo M, Zhou Y (2019) The screening and identification of DNA barcode sequences for Rehmannia. Sci Rep 9(1):17295
Edgar RC (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 32(5):1792–1797
Feng C, Pettersson M, Lamichhaney S, Rubin CJ, Rafati N, Casini M, Folkvord A, Andersson L (2017) Moderate nucleotide diversity in the Atlantic herring is associated with a low mutation rate. Elife 6:e23907
Fontaine B, Achterberg K, Alonso-Zarazaga MA (2012) New species in the old world: Europe as a frontier in biodiversity exploration, a test bed for 21st century taxonomy. PLoS ONE 7(5):e36881
Fu YX (1997) Statistical tests of neutrality of mutations against population growth. Hitchhiking Backgr Sel Genet 147(2):915–925
Ganie SA, Molla KA, Henry RJ, Bhat KV, Mondal TK (2019) Advances in understanding salt tolerance in rice. Theor Appl Genet 132(4):851–870
García-Robledo C, Erickson DL, Staines CL, Erwin TL, Kress WJ (2013) Tropical plant-herbivore networks: reconstructing species interactions using DNA barcodes. PLoS ONE 8(1):e52967
Gogoi B, Wann SB, Saikia SP (2020) DNA barcodes for delineating Clerodendrum species of North East India. Sci Rep 10(1):13490
Gong L, Zhang D, Ding X, Huang J, Guan W, Qiu X, Huang Z (2021) DNA barcode reference library construction and genetic diversity and structure analysis of Amomum villosum Lour. (Zingiberaceae) populations in Guangdong Province. PeerJ 9:e12325
Gonzalez MA, Baraloto C, Engel J, Mori SA, Pétronelli P, Riéra B, Roger A, Thébaud C, Chave J (2009) Identification of Amazonian trees with DNA barcodes. PLoS ONE 4(10):e7483
Hall BG (2013) Building phylogenetic trees from molecular data with MEGA. Mol Biol Evol 30(5):1229–1235
Hebert PDN, Cywinska A, Ball SL, deWaard JR (2003) Biological identifications through DNA barcodes. Proc R Soc Lond B Biol Sci 270:313–321
Hollingsworth PM (2011) Refining the DNA barcode for land plants. Proc Natl Acad Sci USA 108(49):19451–19452
Kim SH, Cho CH, Yang M, Kim SC (2017) The complete chloroplast genome sequence of the Japanese Camellia (Camellia japonica L). Mitochondrial DNA B Resour 2(2):583–584
Koch W, Zagórska J, Marzec Z, Kukula-Koch W (2019) Applications of tea (Camellia sinensis) and its active constituents in cosmetics. Molecules 24(23):4277
Korotkova N, Nauheimer L, Ter-Voskanyan H, Allgaier M, Borsch T (2014) Variability among the most rapidly evolving plastid genomic regions is lineage-specific: implications of pairwise genome comparisons in Pyrus (Rosaceae) and other angiosperms for marker choice. PLoS ONE 9(11):e112998
Kroymann J, de Groot GA, During HJ, Maas JW, Schneider H, Vogel JC, Erkens RH (2011) Use of rbcL and trnL-F as a two-locus DNA barcode for identification of NW-European ferns: an ecological perspective. PLoS ONE 6(1):e16371
Letunic I, Bork P (2021) Interactive Tree Of Life (iTOL) v5: an online tool for phylogenetic tree display and annotation. Nucleic Acids Res 49(W1):W293–W296
Li X, Yang Y, Henry RJ, Rossetto M, Wang Y, Chen S (2015) Plant DNA barcoding: from gene to genome. Biol Rev Camb Philos Soc 90(1):157–166
Li W, Zhang C, Guo X, Liu Q, Wang K (2019) Complete chloroplast genome of Camellia japonica genome structures, comparative and phylogenetic analysis. PLoS ONE 14(5):e0216645
Li H, Xiao W, Tong T, Li Y, Zhang M, Lin X, Zou X, Wu Q, Guo X (2021) The specific DNA barcodes based on chloroplast genes for species identification of Orchidaceae plants. Sci Rep 11(1):1424
Liu HL, Zeng YT, Zhao X, Ye YL, Wang B, Tong HR (2020) Monitoring the authenticity of pu’er tea via chemometric analysis of multielements and stable isotopes. Food Res Int 136:109483
Lu H, Jiang W, Ghiassi M, Lee S, Nitin M (2012) Classification of Camellia (Theaceae) species using leaf architecture variations and pattern recognition techniques. PLoS ONE 7(1):e29704
Luna I, Ochoterena H (2004) Phylogenetic relationships of the genera of Theaceae based on morphology. Cladistics 20(3):223–270
Lv ZY, Zhang JW, Chen JT, Li ZM, Sun H (2020) The complete chloroplast genome of Soroseris umbrella (Asteraceae). Mitochondrial DNA B 5:637–638
Meng XH, Li N, Zhu HT, Wang D, Yang CR, Zhang YJ (2019) Plant resources, chemical constituents, and bioactivities of tea plants from the genus camellia section Thea. J Agric Food Chem 67(19):5318–5349
Neininger K, Marschall T, Helms V (2019) SNP and indel frequencies at transcription start sites and at canonical and alternative translation initiation sites in the human genome. PLoS ONE 14(4):e0214816
Newmaster SG, Ragupathy S, Janovec J (2009) A botanical renaissance: state-of-the-art DNA barcoding facilitates an Automated Identification Technology system for plants. Int J Comput Appl Technol 35(1):50–60
Niu S, Song Q, Koiwa H, Qiao D, Zhao D, Chen Z, Liu X, Wen X (2019) Genetic diversity, linkage disequilibrium, and population structure analysis of the tea plant (Camellia sinensis) from an origin center, Guizhou plateau, using genome-wide SNPs developed by genotyping-by-sequencing. BMC Plant Biol 19(1):328
Presgraves DC (2018) Evaluating genomic signatures of “the large X-effect” during complex speciation. Mol Ecol 27(19):3822–3830
Ronquist F, Teslenko M, van der Mark P, Ayres DL, Darling A, Höhna S, Larget B, Liu L, Suchard MA, Huelsenbeck JP (2012) MrBayes 3.2: efficient Bayesian phylogenetic inference and model choice across a large model space. Syst Biol 61(3):539–542
Sheth BP, Thaker VS (2017) DNA barcoding and traditional taxonomy: an integrated approach for biodiversity conservation. Genome 60(7):618–628
Skvarla M, Kramer M, Owen CL, Miller GL (2020) Reexamination of Rhopalosiphum (Hemiptera: Aphididae) using linear discriminant analysis to determine the validity of synonymized species, with some new synonymies and distribution data. Biodivers Data 8:e49102
Smidt EDC, Páez MZ, Vieira LDN, Viruel J, De Baura VA, Balsanelli E, De Souza EM, Chase MW (2020) Characterization of sequence variability hotspots in Cranichideae plastomes (Orchidaceae, Orchidoideae). PLoS ONE 15:e0227991
Stoeckle MY, Gamble CC, Kirpekar R, Young G, Ahmed S, Little DP (2011) Commercial teas highlight plant DNA barcode identification successes and obstacles. Sci Rep 1:42
Tamura K, Stecher G, Kumar S (2021) MEGA11: molecular evolutionary genetics analysis version 11. Mol Biol Evol 38(7):3022–3027
Trávníček P, Čertner M, Ponert J, Chumová Z, Jersáková J, Suda J (2019) Diversity in genome size and GC content shows adaptive potential in orchids and is closely linked to partial endoreplication, plant life-history traits and climatic conditions. New Phytol 224:1642–1656
Tung NH, Ding Y, Choi EM, Minh CV, Kim YH (2009) New neolignan component from Camellia amplexicaulis and effects on osteoblast differentiation. Chem Pharm Bull (tokyo) 57(1):65–68
Wang M, Zhang Y (2018) Adulteration detection of tea samples based on plant rbcL gene sequencing. Sheng Wu Gong Cheng Xue Bao 34(2):275–281
Wang Y, Yang Y, Wei C, Wan X, Thompson HJ (2016) Principles of biomedical agriculture applied to the plant family Theaceae to identify novel interventions for cancer prevention and control. J Agric Food Chem 64(14):2809–2814
Wang J, Tang X, Chu Q, Zhang M, Zhang Y, Xu B (2022) Characterization of the volatile compounds in Camellia oleifera seed oil from different geographic origins. Molecules 27(1):308
Xie L, Zhao J, Liu R (2021) The complete chloroplast genome of Pseudognaphalium affine (D.Don) Anderb. (Asteraceae). Mitochondrial DNA B Resour 6(11):3276–3277
Xu Y, Liu Y, Jia X (2021) Complete chloroplast genome of a cultivated oil camellia species, Camellia gigantocarpa. Mitochondrial DNA B Resour 7(1):43–45
Yang M, Xie F, Li J, Zhang Y, Li X, Yin H, Li J (2021) The complete chloroplast genome of Camellia fluviatilis (Theaceae), a wild oil-Camellia species. Mitochondrial DNA B Resour 6(12):3511–3512
Yu N, Gu H, Wei Y, Zhu N, Wang Y, Zhang H, Zhu Y, Zhang X, Ma C, Sun A (2016) Suitable DNA barcoding for identification and supervision of Piper kadsura in Chinese medicine markets. Molecules 21(9):1221
Yu XQ, Drew BT, Yang JB, Gao LM, Li DZ (2017) Comparative chloroplast genomes of eleven Schima (Theaceae) species: Insights into DNA barcoding and phylogeny. PLoS ONE 12(6):e0178026
Zhang L, Wang Y, Chen Q, Luo Y, Zhang Y, Tang HR, Wang XR (2015) Phylogeny of Rubus in China based on ndhF sequences. Acta Horticult Sin 42(1):19–30
Zhang Y, Meng Q, Wang Y, Zhang X, Wang W (2020) Climate change-induced migration patterns and extinction risks of Theaceae species in China. Ecol Evol 10(10):4352–4361
Zhang M, Tang YW, Xu Y, Takahiro Y, Shao Y, Wang YG, Song ZP, Yang J, Zhang WJ (2021) Concerted and birth-and-death evolution of 26S ribosomal DNA in Camellia L. Ann Bot 127(1):63–73
Acknowledgements
The authors are grateful Dr. Shuai Hu, School of life sciences, Tsinghua University for critical review of this manuscript.
Funding
Authors would like to acknowledge the funding received from Key Research & Development Project of Hunan Provincial Department of Science and Technology (2019NK2081), National Natural Science Foundation of China (31872866) and National Key Research and Development Program of China (2017YFF0210301) to carry out this assignment.
Author information
Authors and Affiliations
Contributions
XHG and YLL designed the research. SJ, FLC, PQ, HX, GP and YLL collected and analyzed data. SJ, FLC, YLL and XHG wrote the main manuscript text. SJ prepared all figures and tables. All authors read and approved the manuscript.
Corresponding authors
Ethics declarations
Conflict of interest
The authors declare that they have no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Fig. S1
The ML tree of Theaceae plants coming from matK + rbcL + ycf1 sequences based on the K2P model. The bootstraps of tree branches were all greater than 75%. Triangle marks (color in red) on branches represent bootstrap values; the bigger the triangles, the higher the bootstrap values would be. At the same time, displaying branch lengths on the main branches, namely the length of evolutionary distance, which retained four significant digits. (TIF 2747 kb)
Fig. S2
The Bayes tree of Theaceae plants coming from matK + rbcL + ycf1 sequences based on the K2P model. The bootstraps of tree branches were all greater than 75%. Triangle marks (color in red) on branches represent bootstrap values; the bigger the triangles, the higher the bootstrap values would be. At the same time, displaying branch lengths on the main branches, namely the length of evolutionary distance, which retained four significant digits. (TIF 3597 kb)
Fig. S3
Combinatorial super DNA barcodes employed as substitute sequences including one-dimensional and two-dimensional DNA barcodes of Theaceae plants based on matK, ndhF and ycf1. In one-dimensional DNA barcodes, base A, T, C, G in green, red, blue and black respectively. (TIF 6484 kb)
Table S1
Accession numbers, version numbers, accepted names, synonyms and definition of matK (single gene sequence) in Theaceae plants on the NCBI online database. (XLSX 19 kb)
Table S2
Accession numbers, version numbers, accepted names, synonyms and definition of rbcL (single gene sequence) in Theaceae plants on the NCBI online database. (XLSX 17 kb)
Table S3
Accession numbers, version numbers, accepted names, synonyms and definition of ndhF (single gene sequence) in Theaceae plants on the NCBI online database. (XLSX 19 kb)
Table S4
Accession numbers, version numbers, accepted names, synonyms and definition of ycf1 (single gene sequence) in Theaceae plants on the NCBI online database. (XLSX 17 kb)
Table S5
Accession numbers, version numbers, accepted names, synonyms and definition of matK + rbcL (combined gene sequence) in Theaceae plants on the NCBI online database. (XLSX 17 kb)
Table S6
Accession numbers, version numbers, accepted names, synonyms and definition of matK + ndhF (combined gene sequence) in Theaceae plants on the NCBI online database. (XLSX 16 kb)
Table S7
Accession numbers, version numbers, accepted names, synonyms and definition of matK + ycf1 (combined gene sequence) in Theaceae plants on the NCBI online database. (XLSX 16 kb)
Table S8
Accession numbers, version numbers, accepted names, synonyms and definition of rbcL + ndhF (combined gene sequence) in Theaceae plants on the NCBI online database. (XLSX 16 kb)
Table S9
Accession numbers, version numbers, accepted names, synonyms and definition of rbcL + ycf1 (combined gene sequence) in Theaceae plants on the NCBI online database. (XLSX 16 kb)
Table S10
Accession numbers, version numbers, accepted names, synonyms and definition of ndhF + ycf1 (combined gene sequence) in Theaceae plants on the NCBI online database. (XLSX 16 kb)
Table S11
Average AT and GC content at different coding positions of codons in Theaceae plants. (DOCX 17 kb)
Table S12
The mean number of identical pairs, base transition and transversion at different coding positions of codons and ratio of transitionsal pairs to transversional pairs (R value) of Theaceae plants. (DOCX 21 kb)
Table S13
Accession numbers, version numbers, accepted names, synonyms and definition of matK + ndhF + ycf1 (combined gene sequence) in Theaceae plants on the NCBI online database (XLSX 15 kb)
Table S14
NCBI Blast output of alternative candidate gene sequence segments. (DOCX 18 kb)
Rights and permissions
About this article
Cite this article
Jiang, S., Chen, F., Qin, P. et al. The specific DNA barcodes based on chloroplast genes for species identification of Theaceae plants. Physiol Mol Biol Plants 28, 837–848 (2022). https://doi.org/10.1007/s12298-022-01175-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12298-022-01175-7