Endorsement and phylogenetic analysis of some Fabaceae plants based on DNA barcoding

Background DNA barcoding have been considered as a tool to facilitate species identification based on its simplicity and high-level accuracy in compression to the complexity and subjective biases linked to morphological identification of taxa. MaturaseK gene (MatK gene) of the chloroplast is very vital in the plant system which is involved in the group II intron splicing. The main objective of this study is to determine the relative utility of the “MatK” chloroplast gene for barcoding in 15 legume as a tool to facilitate species identification based on their simplicity and high-level accuracy linked to morphological identification of taxa. Methods and Results MatK gene sequences were submitted to GenBank and the accession numbers were obtained with sequence length ranging from 730 to 1545 nucleotides. These DNA sequences were aligned with database sequence using PROMALS server, Clustal Omega server and Bioedit program. Maximum likelihood and neighbor-joining algorithms were employed for constructing phylogeny. Overall, these results indicated that the phylogenetic tree analysis and the evolutionary distances of an individual dataset of each species were agreed with a phylogenetic tree of all each other consisting of two clades, the first clade comprising (Enterolobium contortisiliquum, Albizia lebbek), Acacia saligna, Leucaena leucocephala, Dichrostachys Cinerea, (Delonix regia, Parkinsonia aculeata), (Senna surattensis, Cassia fistula, Cassia javanica) and Schotia brachypetala were more closely to each other, respectively. The remaining four species of Erythrina humeana, (Sophora secundiflora, Dalbergia Sissoo, Tipuana Tipu) constituted the second clade. Conclusion Moreover, their sequences could be successfully utilized in single nucleotide polymorphism or as part of the sequence as DNA fragment analysis utilizing polymerase chain reaction in plant systematic. Therefore, MatK gene is considered promising a candidate for DNA barcoding in the plant family Fabaceae and provides a clear relationship between the families.


Introduction
Fabaceae is considering a large and economically vital family of flowering plants which is usually known as the legume family [1][2][3][4][5]. The Fabaceae family, which has over 490 medicinal plant species 730 genera flowering plants and more than 19,400 species [5][6][7][8][9]. Documentation of the Mediterranean legume crops depending on morphological characteristics has shown tricky and much impossible [10][11][12][13]. So, using a DNA-based technique would offer accurate knowledge and facilitate the discrimination of the species. DNA barcoding is new, efficient, quick, low-cost, and standard technique for the fast identification and evaluation of plant and animal species based on DNA sequence from a small fragment of the whole genome in a rapid, accurate [14][15][16][17][18]. DNA barcoding can help to detect species, quick identification of any species that are possibly novel to science and to report the essential ecological and evolutionary questions as a biological instrument [19][20][21][22][23][24][25]. DNA barcoding are frequently promoted for their facility to enhance the accessibility of scientific information and new knowledge to the public and non-specialists [26,27]. Short DNA sequences in DNA barcoding are used to identify the diversity between plant and animal species as molecular markers [28], also, it's used in an assignment the unknown samples to a taxonomic group, and in-plant biodiversity documentation [29]. DNA barcoding is a potential tool to detect an error in identifying species because similarity-based approaches using DNA barcoding combined with morphology would solve the misidentification based on morphology [30][31][32]. DNA barcoding could help decrease the limitations of morphological characteristics and hurry up plant and animal species identification since it can detect the organisms at any stage of growth. DNA barcodes are designed to create a shared community resource of DNA sequence that is used in the identification or taxonomic classification of any organisms [33]. The usage of DNA barcodes as a tool for plant/ animal identification is based on the establishment of high-value reference databases of sequence [34] which cannot always distinguish between closely related species of land plants or fungi.
The MatK gene (1500 bp in length), located inside the intron of the mitochondrially encoded tRNA lysineprovided by HGNC (trnK) and codes for maturase as protein, which is involved in Group-II intron splicing, the trnK intron of plants encodes the MatK open reading frame (ORF). This gene has a high-level rate of substitution [3], a huge proportion of difference at nucleic acid levels at first and second codon place, and low transition and or/ transversion ratio and the presence of mutationally conserved regions. Previous data were utilized to identify the molecular markers, which were used to identify the genus/ species of these taxa, to provide valuable information for both conventional and molecular studies [11]. The current study, target to evaluate the capacity and the efficiency of MatK gene as normal plant barcode marker; documentation and identification of 45 plant specimens belonging to 15 species of Fabaceae plant species, and study the useful annotation, homology modeling and sequence analysis to permit an additional efficient use of these sequences between different plant species.

Plant materials
Forty-five samples (three replicates for each species), which belonged to 15 species found in Fig. 1 were collected from Antoniadis Garden's (N 29" 56′ 55, E 18" 12′ 31), Alexandria, Egypt between July 2019 to January 2020.  -CGT ACA GTA CTT TTG TGT TTA CGA G-3' (Tm, 53.9 and GC%, 40), R: 5'-ACC CAG TCC ATC TGG AAA TCT TGG TTC-3' (Tm, 60.4 and GC%, 48). The PCR products were run on a 1.0% agarose gel utilizing 1X TAE buffer containing 0.5 µg/mL ethidium bromide for electrophoresis of PCR products as found in Fig. 3. PCR products were purified using Mini kit @ iNtron Biotechnology Purification kits before being sequenced exploitation the dideoxynucleotide chain termination method with a DNA sequencer (Applied Biosystems® 3500 and 3500xL Genetic Analyzers) and a BigDye Terminator version 3.1 Cycle Sequencing RR-100 Kit (Applied Biosystems). The sequences were submitted to DDBJ/EMBL/GenBank database. Generic and species data was achieved from the taxonomy database of the National Centre for Biotechnology Information (NCBI).

Sequence analysis
The sequences results analysis was completed for the one grouped dataset, this set contains all the plant species of 1 3 Fabaceae for which the sequences are available in Gen-Bank to find the inter-species and inter-generic variation. Fabaceae species sequences of MatK were retrieved from NCBI in Fasta format. Multiple sequence alignments of the MatK gene were conducted from different species applying the PROMALS server [35], Clustal Omega server [36], the BIOEDIT software [37] and MEGA-11 [38] which are offline software that conducts optimal sequence alignment to find the conserved area. The MEGA 11 software has matured to contain a large collection of methods and tools of computational molecular evolution for building timetrees of species, pathogens, and gene families using rapid relaxed-clock methods and estimating divergence times and confidence intervals for node-dating and sequence sampling dates for tip-dating analyses. Comparing to the greatest alignment methods with development for distantly related sequences the "PROMALS" is up to 30% more accurate. Clustal Omega server is a new multiple sequence alignment software that generates alignments between three or more sequences using seeded guide trees and HMM profile-profile methods. The "BIOEDIT" software is a user-friendly biological sequences alignment editor that aims to provide fundamental functions for editing, aligning, manipulating, and analyzing protein sequences.

Molecular evolution and phylogenetic analysis
The Neighbor Joining method was used to deduce the evolutionary narrative. Finding the topology and branch length of the tree that will offer the best chance of detecting amino acid sequence in current data is the approach for constructing the phylogenetic tree using maximum likelihood. So, for phylogenetic evaluation Mafft server [39], Clustal Omega server and "MEGA-11" software were applied. MEGA was used to analyze the sequencing data using the neighbor-joining technique and Unweighted Pair Group Mean Average "UPGMA." The "DNADIST" software of "PHYLIP" was used to calculate distances. NJ plot was used to do bootstrapping and decay analysis. MEGA determined parsimony analyses and different clades.

DNA extraction and PCR amplification
The quality of the obtained DNA was detected 1% agarose gel electrophoresis. The results indicated that there is no fragmentation was observed in extracted DNA. The quantity of extracted DNA samples was determined by using Nanodrop Spectrophotometer and the concentration ranged from 30 to 50 ng/μl. The extracted DNA was directly used in PCR amplification for the MatK gene recorded on fragment in molecular weight (900 bp).
Development in DNA sequencing methods has allowed us to describe the genomes of numerous organisms quickly. Evaluations of the DNA sequences of several species are providing useful knowledge about their taxonomy, gene makeup, and utilization. In the current study using DNA sequence polymorphisms of the chloroplast, MatK gene is much more variable than many other genes. From Fifteen plant species belong to different genera of the same family Fabaceae as found in Table 1. In this data we organized a study to contribute to the knowledge of the major evolutionary relationship between the studied plant genus and species (clades) and discussed the application of MatK for molecular evolution. The chloroplast MatK marker was more useful as DNA markers. The present study included fifteen species from fifteen genera are deposited in Gen-Bank; accession numbers were obtained for the respective plant species with different numbers of conserved domains, segment length and average entropy (Hx) ( Table 1).

Phylogenetic analysis of collected plants
Numerous sequence alignments showed that there are varying numbers of "Indels" in the gene MatK. Using the neighbor-joining method, UPGMA and maximum likelihood, the evolutionary distances for the 15 plant species were recognized into individual clades. The alignment of MatK gene of Acacia saligna nucleotide sequences showed 15 conserved regions, 769 variable sites and 571 parsimony sites, the overall mean distance is 2.85 ( Table 2). The combined tree showed two groups or cladograms and they are represented as follows: Group I include Acacia saligna was closely related to different species belonging to other genera of the same family (Fabaceae) such as Enterolobium, Pararchidendron, Archidendron, Samanea, Hydrochorea, Balizia and Abarema (Fig. 4). Also, Acacia comprising other species were closely arranged but distinguished into different genera such as Falcataria, Pararchidendron and Lacacia.
In addition, the aligned MatK dataset was 793 nucleotide sites long, of which 102 sites were potentially parsimony informative. Consequently, Enterolobium contortisiliquum is more closely related to different species of genus Acacia according to phylogenetic analysis using maximum likelihood (Fig. 4). The length of MatK varies from 750 bp in Albizia lebbek (the smaller length of MatK gene for these species is due to incomplete sequencing, which was retrieved from GenBank) to 813 in different genera (Enterolobium, Acacia, Senegalia, Cojoba, Samanea, Hydrochorea, Balizia and Abarema). Maximum likelihood and Neighbor-joining analysis of the dataset resulted in tree with two groups. The clades established in the trees were mainly mixtures of numerous species. Consequently, creating a local barcode database will be useful for a broad range of potential ecological purposes, involving the building of community phylogenies [40] . Group I have three clusters comprising several genera (Albizia, Enterolobium, Mariosousa, Archidendron, Samanea, Balizia, and Abarema). Otherwise, group II has one genera acacia which is the most closely related to our plant Albizia lebbeck according to MatK gene partial cds (Fig. 4). The arrangement of MatK gene of Albizia lebbeck nucleotide sequence revealed 649 varying sites and 359 parsimony sites, the overall mean distance is 2.37 and the estimated Transition/Transversion bias (R) is 0.52 (Table 2, 3). Table 2 The Homogeneity test of substitution patterns between sequences of fifteen studied species

3
Furthermore, depending on the phylogenetic analysis, the two genera Cassia and Senna with different species are closely related and more highly similar than any other studies species (Fig. 5). The phylogeny tree was created using the neighbor-joining approach and the evolutionary distances were calculated employing the maximum composite likelihood approach. The combined trees showed that there are two groups, and they are as follows: Group I consisted of five clades representing different genera with different species such as (Chamaecrista, Senna, Erytherophleum, Arapatiell and Dinizia). Group II showed two branches: each one with many sub-branches containing five clades with different species of the genus Senna. According to MatK gene sequence, the collected plants (Cassia fistula and Cassia javanica) revealed a high percentage of identity with different 7 species of genus senna having the same clade (Fig. 5). Also, they are closely related to other different species of Erytherophleum, Arapatiell. On another hand, the sequence of MatK gene of collected Senna surattensis species has a high degree of similarity with many species in different genera in Fabaceae (Fig. 5), and consequently, this species is used as a template to estimate the similarity between different species in Fabaceae family.
Nevertheless, the Delonix regia is the more studied species having a good similarity to different species of different genera of the Fabaceae family. Polymorphism obtained from the DNA sequence indels or replacements of the MatK gene indicated that Delonix regia, Umtiza listeriana, Diptychandra aurantiaca, Moldenhawera blanchetiana, Schizolobium parahyba, Tachigali costaricensis, Arapatiella psilophylla and Parkinsonia Africana were evolved from a Common ancestor (Fig. 6). In addition, Dichrostachys cinerea is closely related to different species of genera Leucaena, Senegalia, Falcataria and prosopis (Fig. 6). Furthermore, applying the same incremental method of informative sites starting at the 5/-end of the MatK gene, completely different results were found. The consensus tree of 15 most parsimonious trees demonstrated unresolved clades until 250 informative sites. At that point, 1 highly parsimonious tree was created, which was congruent with the topology of the stable tree achieved from the 3/-end. To recognize the greatest DNA barcode marker for species documentation and traceability, the value of genetic divergence for all the confirmed loci were calculated in each analyzed group at dissimilar taxonomic level and by considering only fresh morphologically identified samples. Results indicated the species of Delonix regia, Parkinsonia aculeata and Leucaena leucocephala are more like other species of the same genus and less similar to species of other genera of the Fabaceae family (Fig. 6). This reflected that the Parkinsonia aculeata was closely related and in the same clade with Schizolobium parahyba, Diptychandra aurantiaca, Delonix regia, Conzattia multiflora and Colvillea racemose (Fig. 6).
The current sequences showed little variations in the percentage of guanine plus cytosine content (% G + C) related to that in the sequences of MatK. In case of MatK, the nucleotide structure was biased toward the guanine and cytosine content (G + C) with frequencies were 30.4 to 34.8%, respectively. The NJ, ML, and MP analyses all resulted in comparable trees in each of the data sets. There are often variations between the trees from the various analyses involving nonresolution (polytomies). Analyses carried out on samples belonging to Parkinsonia aculeata, Schotia brachypetala, Sophora secundiflora and Tipuana Tipu indicated that the sequences divergences of marker MatK were clearly distinguished from other species of Fabaceae. Figures 7 showed phylogenetic clusters constructed using ML and NJ; The difference observed in MatK does separate several species; however, there is a wide range of intra-specific and interspecific variation. Furthermore, On the Neighbor-Joining Phylogram, the Schotia group is a sister taxon to the Macrolobium group and this observation is found in 50% of the most clade in this cladistic analysis (Fig. 7).
The last two members i.e., Sophora secundiflora and Tipuana Tipu produced an independent clade and confirmed the ambiguous position relative to the other genera of Fabaceae based on the combined cladistic analysis data from chloroplast DNA restriction sites and morphology. Sophora secundiflora shared a common ancestor with Angylocalyx braunii, Zollernia splendens, Ormosia xylocarpa and Dermatophyllum secundiflorum. Also, Tipuana Tipu is in the same clade with different species of two genera Centrolobium and Pterocarpus (Fig. 7). Additionally, highly Fabaceae species in the current research were detected to have a unique sequence in the MatK gene. These results will offer a valuable way to authenticate various MatK species. MatK sequence created in this analysis will be applied to construct reference sequence libraries, and the sequences extracted from samples with particular identity classifications will be utilized to search the database.
Lastly, utilizing BLAST1 and the closest genetic distance approach, we will be able to define the species identities of the query sequences based on these data. In the dataset of MatK, the nearest genetic distance approach achieved 99.68% to 96.45% identification accomplishment rates at the species level for "BLAST1" and distance discrimination Table 3 The estimation of evolutionary divergence between fifteen studied species

Discussion
Because plant genomes include several copies of MatK sequences, it's unclear if the sequence obtained by PCR will be balanced and representative [41]. As a result, we suggest MatK as a potential barcode sequence in the Fabaceae family, as well as a wider range of plant species. Utilizing MatK as a DNA barcode would extend our knowledge of phylogenetics and population genetics in Fabaceae species as reviewed by [42][43][44][45]. We also recommend that MatK be used as a DNA barcode sequence to overcome difficulties in Fabaceae genus and species categorization [44,46]. MatK might serve as a starting point for quality control and assurance of plant materials utilized in research, manufacturing, customs, and forensics.
The MatK was discovered to be a necessarily variable DNA region between Fabaceae species as determined by genetic divergences, and it demonstrated a greater potential of effective discrimination. MatK can be a powerful taxonomic marker for identifying species and resolving taxonomic issues [30,32,41]. For instance, the MatK sequence of Enterolobium contortisiliquum is highly like Albizia lebbek, so our results indicate that in the genus Cassia, in which the species were poorly graded, MatK was still able to distinguish among some confusing species [47]. The evolutionary distances for the 15 plant species that were separated into distinct clades were analyzed using the maximum likelihood and neighbor-joining methods, which discriminated most of the species better than previous techniques [48]. The identification by MatK region paired with morphological recognition 100% to species (Fig. 1) level; for the set of plants studied, it appears to be an accurate approximation of species identification using this one locus. Short sequence, universality, and unique identifiers are three features of a common barcode [48,49]. According to our results of sequence length and composition of MatK barcode gene for the 15 plant species, MatK regions have high rate of nucleotide substitutions as showed by [50] or the locus remodeling ring [51]. Alternate primer sequences may increase the success rate of MatK amplification for some of the current taxa, making it a barcoding locus. The species in which the MatK region is amplified, however, had wide taxonomic coverage in the Fabaceae family, indicating that the locus' conserved sequence is notable.
Consequently, the partial amplification sequence of MatK was further utilized to investigate the evolutionary linkage of the selected plants. The evolutionary distances between the 15 plant species were divided into two clades using the neighbor-joining approach., the first clade comprising (Enterolobium contortisiliquum, Albizia lebbek), Acacia saligna, Leucaena leucocephala, Dichrostachys Cinerea, (Delonix regia, Parkinsonia aculeata), (Senna surattensis, Cassia fistula, Cassia javanica) and Schotia brachypetala which were more closely to each other, respectively. The remaining four species of Erythrina humeana, (Sophora The results are encouraging, which give a backbone of knowledge in the data set. As additional species become accessible, more research for species resolution of a genus may be undertaken.

Conclusion
During the current study, DNA barcoding using MatK chloroplast gene was applied in fifteen legume trees by both single region and multiregional approaches. The obtained chloroplast gene sequences were submitted to GenBank, and fifteen accession numbers were recorded as LC602060, LC602154, LC602263, LC603347, LC603655, LC603845, LC603846, LC603847, LC604717, LC604718, LC605994, LC604799, LC605995, LC606468, LC606469) with length ranging from 730 to 1545 nucleotides. The current results indicated that the phylogenetic tree analysis and the evolutionary distances of an individual dataset of each studied species were agreed with a phylogenetic tree of all each other consisting of two clades, the first clade comprising (Enterolobium contortisiliquum, Albizia lebbek), A. saligna, Leucaena leucocephala, Dichrostachys Cinerea, (Delonix regia, Parkinsonia aculeata), (Senna surattensis, C. fistula, C. javanica) and Schotia brachypetala were more closely to each other, respectively. The remaining four species of Erythrina humeana, (Sophora secundiflora, Dalbergia Sissoo, Tipuana Tipu) constituted the second clade. Finally, it could be concluded that, MatK gene is considered promising a candidate for DNA barcoding in plant family Fabaceae and providing a clear relationship between the families.

Conflict of interest
The authors declare no conflict of interest.

Ethical approval Not applicable.
Informed consent Not applicable.

Consent to Publish Not applicable.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.