Introduction

Monitoring local biodiversity is fundamental for the development of conservation and sustainable strategies. This task requires a trustable species database, which is often lacking or incomplete for many regions on earth, particularly in the tropics, considered the richest in biodiversity1. Based on a recent estimation, there are about 8.7 million eukaryotic species on earth, of which more than 80% of plants remain to be described2. Classical morphological species identification often requires specimens in good conditions and reproductive structures, which are not always easy to obtain in field studies. Also, the higher phenotypic plasticity of plants makes it difficult to obtain an accurate identification, which frequently should be performed by a specialist of the taxon involved. DNA barcoding, a new molecular approach for species identification, overcomes these drawbacks. This technique requires only a piece of tissue of the specimen for species identification. DNA is extracted from a sample of tissue and amplified using universal primers. Then, the short fragment of amplified DNA is sequenced. The sequence is compared with those already published in a DNA database, such as the GeneBank at the National Center of Biotechnology Information (NCBI), or in the Barcode of Life Data system (BOLD) designed explicitly for DNA barcoding3. If the specimen has a DNA sequence that matches ≥ 99% of that already published in the database, then it is concluded that both sequences belong to the same species. Hebert et al. (2003) generated this technique4.

DNA barcoding using cytochrome oxidase I (CO1) has been successfully applied for animal species; but, in plants, CO1 did not work, and more research is required. A sine qua non requirement for species identification using DNA barcoding is the existence of a published trustable sequence. The Consortium for the Barcode of Life's (CBOL) plant working group evaluated seven-candidate plastid DNA regions based on universality, sequence quality, and species discrimination. CBOL recommended using a core of a 2-locus combination of rbcL + matK as the plant barcode5. Other studies suggest using additional loci, including non-coding plastid regions, such as the intergenic spacer trnHpsbA6,7,8 and the nuclear marker ITS9,10. However, such universality has not been found in plants. Therefore, several authors propose a regional barcode for a wide range of ecological and conservation applications since the specimens are most likely to be identified using a restricted reference database11,12.

In Mexico, tropical montane cloud forests (TMCFs) account for 1% of the country's land area but have a higher plant and animal diversity concentration than any other Mexican ecosystem13. Pteridophytes are well represented in TMCFs in Southeast Mexico14. Oaxaca is the state where more diversity of pteridophytes has been observed15, but some places require increased sample collection16. Areas of difficult access, like the Mixteca highlands, had only one reported species15. To our knowledge, no new records were reported since this publication in this area. The purpose of this study is to evaluate the performance of three plastid barcodes: partial gene rbcLa, matK, and the intergenic spacer trnH-psbA, using standard primers and under the three CBOL criteria (universality, quality, and species discrimination)5 to build a barcode library of pteridophytes in the Mixteca highlands, Oaxaca, Mexico.

Material and methods

Study site, determination, and vouchers

Twenty-nine samples of ferns from 11 families and two samples of lycopods were collected at a fragmented cloud forest at San Miguel Cuevas, Santiago Juxtlahuaca Municipality, Oaxaca state, México (17° 15′ 00.96″ N y 98° 02′ 57.34″), with 2187 m asl of mean altitude. Climate is temperate to semi-warm, and soils are rich in organic matter. Local authorities granted permission to visit their forests and to collect parts of the plants. Fresh plant vouchers were determined by Dr. Daniel Tejero Díez from UNAM FES Iztacala following Mickel and Smith17, recent taxonomic monographs18. Scientific names were checked in the Tropicos.org (https://www.tropicos.org/home) website and the Catalogue of Life (https://www.catalogueoflife.org/). The specimen names were compared with the type material in Jstor global plant (https://plants.jstor.org/). The herbarium vouchers were deposited at the National Mexican Herbarium, from Universidad Nacional Autónoma de Mexico (MEXU), and the herbarium from CIIDIR Oaxaca, Instituto Politécnico Nacional (OAX), pending for registration numbers due to the pandemic crisis.

All plant samples were collected in the field under permits issued by the municipal councils of San Miguel Cuevas, Juxtlahuaca. In no case was the full plant collected; the process of collecting the samples did not kill the plants, which were left alive in their original places. None of the samples collected belong to the near-threatened, vulnerable, endangered, critically endangered, extinct in the wild, or extinct of the IUCN red list (accessed September 7, 2021). Of all species studied, only three belong to the category “least concern” according to the IUCN: Asplenium monanthes, Cystopteris fragilis, and Pteridium feei. The rest of the species were not registered in the IUCN red list, probably because of insufficient data. We, therefore, encourage more studies to assess the IUCN status of these plants.

Isolation, amplification, and sequencing of DNA

The number of individuals sampled per taxon was generally one and less frequent 2. Several leaves of each botanical sample were placed in a Ziplock® bag and kept at − 20 °C in a freezer until processed. Genomic DNA was extracted from 2 mg leaf tissue with FastDNA SPIN kit and FastPrep® (MP Biomedicals, USA) equipment. DNA concentration (ng/µl) and purity (260/280A) from total DNA extracted were measured in a Biophotometer (Eppendorf®). Three chloroplast DNA regions were used for amplification: rbcLa, matK, and the intergenic spacer trnH-psbA. We used standard primers from the Canadian Center for DNA Barcoding (CCDB)19 and a second set of primers for MatK20 (see Table 1 for primer DNA sequences). All three chloroplast regions were amplified using a 25 μl volume of reaction mixture: 5 µl MyTaq Buffer reaction (kit MyTaqDNA Polymerase Bioline), 1 μl of forward primer, 1 μl of reverse primer, 0.2 μl of MyTaq Polymerase, 15.8 μl of nuclease-free water, and 2 μl of isolated genomic DNA template. PCR reaction was carried out in an Applied Biosystems Veriti® thermocycler. PCR temperature cycling programs followed Fasekas et al. protocols24. PCR for rbcLa: 94 °C for 4 min; 35 cycles of 94 °C for 30 s, 55 °C for 30 s, 72 °C for 1 min; final extension 72 °C for 10 min. PCR for matK 94 °C for 1 min; 35 cycles of 94 °C for 30 s, 52 °C for 20 s, 72 °C for 50 s; final extension 72 °C during 5 min. PCR for trnH-psbA (for ferns and allies) 94 °C for 4 min; 2 cycles of 94 °C for 45 s, 50 °C for 45 s, 72 °C for 1 min; 35 cycles of 94 °C for 45 s, 45 °C for 45 s, 72 °C for 1 min; final extension 72 °C for 10 min. Amplified PCR products were detected using agarose gel electrophoresis (1.2% agarose gel TBE) under UV light by staining with GelRed Nucleic Acid (Biotium). PCR products were purified using the EZ-10 Spin Column PCR Products Purification Kit (Biobasic). All PCR products were sequenced by Capillary Electrophoresis Sequencing (CES) in an ABI 3730xl System at the Macrogen sequencing facility (Macrogen Inc., Seoul, Korea).

Table 1 Primer sequences used for DNA amplification of chloroplast regions rbcLa, matK, and trnH-psbA.

DNA alignment

rbcLa and trnH-psbA sequence chromatograms were manually edited and assembled into contigs using CodonCode Aligner v.9.0.1 http://www.codoncode.com/aligner/. Due to the low amplification frequency, matK was excluded from further evaluations. Consensus sequences were generated and aligned using MUSCLE25. These alignments were examined by eye and corrected when necessary.

BOLD and Genebank

The project was registered under the name “Ferns and allies of a humid temperate forest in Oaxaca, México” project code FERNO (http://www.boldsystems.org) at The Barcode of Life Data System (BOLD), which is an informatics workbench devoted to the acquisition, storage, analysis, and publication of DNA barcode records3. Three files were submitted to BOLD. First, the Specimen data file included detailed voucher information, scientific names of taxa sampled, collection dates, geographical coordinates, elevation, collectors, identifiers, and habitat. Then, an image file was submitted with high-quality specimen images from each fern and lycopod collected. Finally, a trace file was submitted along with primers, the direction of sequences, and the molecular marker. Sequences uploaded to BOLD were edited and aligned in FASTA format and referenced by Sample IDs. Sequences were also submitted to GeneBank.

Species discrimination

To evaluate species discrimination in rbcL and trnH-psbA sequences, we used three approaches: The Basic Local Alignment Search Tool for nucleotide (BLASTN) method26, which searches against the sequence database available online by the National Center for Biotechnology Information (NCBI) https://www.ncbi.nlm.nih.gov, genetic distance and monophyly tree-based analyses using Neighbor-Joining (NJ), Maximum Likelihood (ML) and Bayesian Inference (BI) analysis.

Following previous studies10,27, query sequences having ≥ 99.0% identical sites to sequences in the database were taken as correct assignments. Percentage species resolution was calculated for each plastid region. The combined rbcL + trnH-psbA species resolution was calculated as the cumulative percentage of each molecular marker28.

To determine the best fit model of nucleotide substitution for phylogenetic analyses jModel test v.2.0.29 was used. We found the general time-reversible model plus gamma distribution (GTR + G) as the best fit for rbcLa, which states for variable base frequencies with symmetrical substitution rates. For trnH-psbA, the best fit was achieved with the transversion plus gamma distribution model (TVM + G), with variable base frequencies, equal variable transversion rates, and transition rates. The data set of each plastid region was analyzed alone and in combination. Sequences of rbcL and trnH-psbA were concatenated into a single matrix rbcL + trnH-psbA with Mesquite30.

Genetic distance and NJ bootstrap consensus tree were inferred from 1000 replicates, and the evolutionary distances were computed using the Kimura 2-parameter method with gaps/missing data treatment adjusted using pairwise deletion. Genetic distance and neighbor-joining trees were constructed in MEGAX31 for each plastid barcode alone and in combination. To evaluate which plastid barcode showed more interspecific divergence and checked for any improvement using these barcodes in combination, we conducted two-sample sign tests with the BSDA package in R32.

We ran ML analyses with the IQ-TREE web server (http://iqtree.cibiv.univie.ac.at). Internal node support, bootstrap analyses were calculated using 1000 iterations. Tree inference using Bayesian analysis was run on MrBayes 3.2.2 on XSEDE via the CIPRES supercomputer cluster (www.phylo.org) for 10 million generations. The tree-based methods (NJ, ML, and BI) evaluated which tree produced the greatest species resolution and whether the barcode sequences form monophyletic groups.

Results

Studied species

Table 2 shows the fern and lycopod species determination and that were used for the barcoding analysis.

Table 2 Ferns and lycopods collected at Mixteca Alta, Oaxaca, samples’ ID, and species determination.

PCR amplification and sequencing success

Using universal primers from CCDB of rbcL and trnH-psbA, fern DNA was successfully amplified in most cases (96.77%). Nevertheless, we could not get matK amplifications (Table 3). Furthermore, a second set of primers for matK designed specifically for most ferns20 were tested, and we could only get 19.36% amplification. In particular, we could only get amplicons from: Phanerophlebia macrosora, Dryopteris wallichiana, Asplenium monanthes, Lophosoria quadripinnata, Cystopreris fragilis, and Blechnum appendiculatum. Therefore, further evaluations only include rbcL and trnH-psbA.

Table 3 Proportion of samples successfully amplified and sequenced from three barcoding plasmid regions using tissues from different species of ferns and lycopods of the Mixteca Alta, Oaxaca, Mexico.

The sequencing success rate (bidirectional high-quality sequences > 250 bp) was higher for rbcL (93.33%) than for trnH-psbA (80.00%) (Table 3).

Blast discrimination, BOLD, and GeneBank

We found 100% resolution per family and genera of ferns and lycopods using BLASTn in both plasmid barcodes. We contributed to new species in the GeneBank Taxonomy Database for DNA sequences for rbcLa (8 species), and trnH-psbA (16 species). With the accessions already published, we found that rbcLa could discriminate to species level 66.67% of the cases, whereas trnH-psbA discriminates 50%, and rbcLa + trnH-psbA 60.61%. The best BLAST match identification per species for rbcLa plastid barcode is shown in Table 4 and for trnH-psbA, in Table 5.

Table 4 BLAST search best match found on GeneBank for ferns and allies of the Mixteca Alta, Oaxaca, Mexico, using DNA sequences obtained from the partial gene rbcLa.
Table 5 BLAST search best match found on GeneBank for ferns and allies of the Mixteca Alta, Oaxaca, Mexico, using DNA sequences obtained from the intergenic spacer trnH-psbA.

A specimen data file, image file, and trace file(s) were submitted to BOLD along with edited and aligned sequences for each of our 29 samples of ferns and two samples of lycopods and can be accessed through the BOLD DNA database (http://www.boldsystems.org) under the ‘FERNO’ project. Twenty-nine fern sequences and two lycopod sequences were newly obtained in this study for rbcLa and trnH-psbA and BOLD ID numbers, and GeneBank accession numbers were generated (Table 6).

Table 6 Ferns and lycopods of the Mixteca Alta, Oaxaca, with their BOLD ID number and GeneBank accession number obtained from rbcLa and trnH-psbA amplifications, along with their sequence length.

Genetic distance

The distribution of intraspecific and interspecific K2P distances across all taxon pairs of the ferns of The Mixteca Alta, Oaxaca, cloud forest, obtained from rbcLa, trnH-psbA, and combined DNA sequences of both plastid barcodes are shown in Fig. 1.

Figure 1
figure 1

Distribution of interspecific and intraspecific K2P distances across all taxon pairs of ferns from Mixteca Alta, Oaxaca, obtained in partial gene rbcLa (a), intergenic spacer (b) and concatenated plastid regions (c).

Based on previous work33, we included only one individual of each species to avoid biases created by an unequal number of sequences of each species. Intergenic spacer trnH-psbA had the highest mean interspecific K2P distance (0.3037 ± 0.1645 s.d.) in contrast to the mean values of rbcLa (0.1275 ± 0.0467 s.d.) and the combined DNA barcodes (0.1959 ± 0.0795 s.d.).

Results from the two-sample sign test in R of single and concatenated DNA sequences of rbcLa and trnH-psbA using tissues from different species of ferns and lycopods of the Mixteca Alta, Oaxaca, Mexico, are shown in Table 7. The intergenic spacer trnH-psbA showed the highest interspecific genetic divergence in comparison to rbcLa (median = − 0.1535, P value < 2.2e−16) and both plastid barcodes concatenated (median = 0.0851, P value < 2.2e−16).

Table 7 Two sample sign-test of interspecific divergence among loci and both plastid barcodes concatenated.

Topology results

Phylogenetic tree-based analysis using neighbor-joining (Supplementary Fig. S1, Supplementary Fig. S2, Supplementary Fig. S3), maximum likelihood (Fig. 2, Supplementary Fig. S4, Supplementary Fig. S5), and Bayesian Inference trees (Supplementary Fig. S6, Supplementary Fig. S7, Supplementary Fig. S8) were reconstructed to evaluate ferns and lycopods species discrimination for the two barcode regions rbcL and trnH-psbA, single and combined (rbcL + trnH-psbA).

In the neighbor-joining trees, samples from Polystichum fournieri FERNO022-20 and Elaphoglossum xanthopodum FERNO030-20 were removed from the analysis of concatenated sequences since there were missing sequences in rbcLa data and trnH-psbA, respectively. The tree-based methods (NJ, ML, and BI) evaluated which tree produced the greatest species resolution and whether the barcode sequences generate monophyletic species (Table 8).

Figure 2
figure 2

Maximum likelihood cladogram of plastid rbcLa for 27 sequences of ferns and 2 sequences of lycopods from Mixteca Alta, Oaxaca, México, tropical montane cloud forest. Bootstrap values based on 1000 replications are listed as percentages at branching points.

Table 8 Proportion (%) of monophyletic fern species and bootstrap or posterior probabilities, in parentheses, recovered with different phylogenetic techniques (NJ, ML, and BI) using single plastid barcodes rbcLa and trnH-psbA and combined DNA regions.

NJ and ML phylogenetic trees resolved 100% of monophyletic species for rbcLa, trnH-psbA, and both barcodes combined (rbcLa + trnH-psbA) with a ≥ 70% clades support using bootstrap of 1000 replicates. The clade support value for rbcLa was higher in ML phylogenetic tree (85.71%) than in the NJ tree (69.23%), whereas the clade support value of trnH-psbA and rbcLa + trnH-psbA was higher in NJ trees (84.61%) than in the ML phylogenetic trees (78.57%). Since the mean clade support of all ML trees was 80.95%, and the mean clade support of all NJ trees was 79.49%, we conclude that the ML and NJ phylogenetic tree satisfactorily resolved the species monophyly of the studied ferns. We present the ML rbcLa phylogenetic tree (Fig. 2) since it yielded the most robust phylogeny: 85.71% of the nodes were supported by a maximum likelihood bootstrap ≥ of 70%.

All Bayesian Inference trees presented polytomies; rbcLa 1 (Supplementary Fig. S6), trnH-psbA 2 (Supplementary Fig. S7), and rbcLa + trnH-psbA 1 (Supplementary Fig. S8). With these polytomies, rbcLa could not resolve 4 monophyletic species, trnH-psbA 18 species, and rbcLa + trnH-psbA 4 species. Unlike the other two phylogenetic methods, BI using concatenated sequences showed an increase in clade support value.

Discussion

Our amplification and sequencing results obtained with rbcLa and trnH-psbA, are very similar to those reported in other ferns studies34,35,36. Contrastingly, matK could not be amplified using two different sets of primers (Table 2). Although matk was proposed with rbcL as the barcode core for plants5,37, ferns appear to be the exception for this common finding. The failure of matK amplification in most leptosporangiate ferns using standard primers is most likely caused by a primer mismatch8,12,20,35,38. In most plants, matK is nested in the trnK intron, but trnK exons are lost in ferns20,39. For detailed studies, highly conserved exons in proximity with variable introns are convenient for phylogenetic analysis, allowing a high amplification efficiency of the primers situated in the exons and intron variability40. Due to the low primer universality of matK in ferns, many studies have designed different matK primers only for local ferns12,20,41. Because of the low amplification rates found in this and other studies34,35, we do not recommend the use of matK in ferns, except for particular situations.

Although we found a successful genera discrimination in these two plastid barcodes using BLASTn analysis, the low results for species discrimination are similar to those observed in ferns of Japan42, in which the rate of BLAST successful species discrimination for rbcLa and trnH-psbA was 70.91% and 65.05%, respectively. We could not find any improvement using both barcodes combined, which differs from results obtained in several studies of land plants6,7,43 and ferns35,42. Low rates of species identification using BLAST in our study are not necessarily caused by low marker performance. Four factors may contribute to explain these results. First, misidentified voucher specimens have been recognized as an increasing problem in public DNA databases, as several authors have acknowledged10,28,44. The rate of specimens correctly identified from the published samples is unknown. Second, online accessions in the GeneBank for our morphological species were limited. We could only find published sequences in 77% of the studied species for rbcLa and 33% for trnH-psbA. Indeed, new 27 rbcLa fern sequences and 27 trnH-psbA fern sequences along two lycopod sequences for each marker were submitted to BOLD along with its metadata.

Third, the widespread existence of hybridization and polyploidy in ferns42,45,46 is another factor that may decrease barcoding species discrimination37. Finally, translocation has been reported in some fern groups41. Other studies found a dramatically reduced trnH-psbA sequence variation for most ferns, probably due to the translocation of this segment into the plastid genome inverted repeat regions41. In our case, however, the intergenic spacer trnH-psbA displayed more interspecific K2P distances than those observed in rbcLa and the combined plastid barcodes (Fig. 1, Table 7). The faster rate of molecular divergence reported in several works5,6,47 for trnH-psbA than that for rbcLa in land plants may account for this result. Our results concur with those found in a recent meta-analysis using five major plant taxonomical groups8, which found a clear barcode gap on trnH-psbA sequences only in the fern group. Our two-sample sign test reveals that the intergenic spacer trnH-psbA offers better species discrimination than rbcLa and both plastid barcode combined for the studied group of ferns (Table 7).

We found similar results in barcode identification performance to those in other fern studies. For instance, higher interspecific variability in trnH-psbA than in rbcLa was also found in a study made in Moorea, French Polynesia with filmy ferns36, a work on Chinese medicinal pteridophytes34, and in studies involving several species of Adiantum35 and Ophioglossum48. However, some exceptions have been found. The mean interspecific divergence values across all taxon pairs (K2P genetic distances) in Japan’s pteridophytes42 did not reveal significant species discrimination between trnH-psbA and rbcLa. The trnH-psbA translocation mentioned above could partly explain these contrasting differences among different ferns studies reported only in certain groups of ferns.

From all topologies obtained in this work, maximum likelihood trees yielded the most robust phylogeny (Table 8). The phylogenetic arrangement found in our study concurs with a recent extant classification of ferns and lycopods49 and with other fern studies42,50. In all of our phylogenetic trees obtained for rbcLa and trnH-psbA, Marattia weinmannifolia is placed near the lycopods. The Marattiacea family is an eusporangiated and ancient group of ferns with fossil records extended back to the Middle Carboniferous51. In a recent study52, results of parsimony dating showed a minimum age estimate of 201–236 Ma, corresponding to late Triassic, for the most recent common ancestor of the extant Marattiaceae. Of all the ferns that we studied, the Marattiaceae is the most primitive, and this explains the higher similarity with the Lycopod outgroup, which is among the oldest groups of vascular plants51.

A paraphyletic clade was observed in the NJ rbcLa tree (Supplementary Fig. S1) and all three phylogenetic trees of psbA-trnH (Supplementary Fig. S2, Supplementary Fig. S4, Supplementary Fig. S7). Elaphoglossum (E. xanthopodum and E. petiolatum) was placed out of the Dryopteridacea family clade. The intergenic spacer trnH-psbA probably was more sensitive to nucleotide substitutions in this genus than rbcLa. A morphological and molecular study of the Elaphoglossum species53, which does not include our studied species, found that the relationship between Elaphoglossum with other fern genera is not clear. This genus was placed within Dryopteridaceae based on its chromosome number (x = 41) and monolete spores. However, in a recent extant fern classification based on new phylogenetic data49, Elaphoglossum was placed in a separate subfamily from the rest of the genera of Dryopteridaceae: Elaphoglossoideae. In agreement with such a decision, our phylogenetic trees using trnH-psbA could also successfully discriminate Elaphoglossum from other members of the Dryopteridaceae family.

Conclusions

Based on the amplification capacity and sequence quality, the partial gene rbcLa and the intergeneric spacer trnH-psbA performed relatively well as barcode markers for ferns in the Mixteca Alta Oaxaca. Our ML phylogenetic trees agree with the recent extant lycophyte and fern phylogeny of the Pteridophyte Phylogeny Group (PPG). rbcLa outperforms in species discrimination and availability of sequences in public databases. However, trnH-psbA outperforms rbcLa in interspecific K2P distances and therefore could be helpful in some phylogenetic analysis involving groups without the inverted sequences translocation that may render low discrimination power. We did not find an increase in species discrimination using both plastid barcodes together in BLASTn, genetic distance, or any topology tree methods. Plastid barcode matK failed to successfully amplify fern and lycopod DNA sequences using universal primers. Our study pinpoints two problems: the low availability of DNA sequences for neotropical fern species and the need for more phylogenetic and polyploidy studies in ferns that clarify the phylogeny of certain groups, such as Elaphoglossum. We hope that the local barcode library that we generated could be the starting point for adding more sequences for a wide range of ecological, conservation, phylogenetic, and medical purposes.