DNA barcoding to assess species identification in museum samples of Amphiliidae and natural samples of Cichlidae from Southern Mozambique

The biodiversity protection and monitorning is one of main goals of natural history musems worldwide. Conservation issues are particularly important for freshwater fish which are one of the most threatened taxa for the consequences of climate change and human activies. In Mozambique freshwater rivers are poorly explored and the impact of aquaculture and human activities on local biodiversity in almost unknown. Here we propose the barcoding analysis of cytochrome c oxidase I (COI) mitochrondrial DNA of 41 frehswater fishes catched in four rivers of southern Mozambique and 53 from a museum collection. As evidence of previous knowledge gaps, barcoding results revealed twenty new haplotypes described for the first time in the taxa Cichlidae and Amphilidae. From a methodological point of view, the barcoding approach demonstrated a critical point connected to the requested 650 bp length of amplified sequences. In fact, high weight genomic DNA is unattainable from museum samples and also in wildlife samples collected in pristine rivers. For this reason we furtherly tested the efficiency of DNA mini-barcoding analysis for 53 fish from a museum collection. The Mini-barcode method retrieved 56.6% of sequences successfully analyzed versus 3% of barcoding. The high performance of this thecniques is discussed in relation to biodiversity monitoring and to fill the taxonomy gaps in museum collections.


Introduction
Molecular approaches are innovative tools helpful to identify new taxa of the fish fauna. In particular, molecular techniques based on DNA barcoding can be used to study taxonomy and resolve evolutionary relationships among taxa (Waugh 2007) to define a correct species identification and origin. DNA barcoding techniques are nowadays widely applied in different fields as product labelling accuracy in commercial fish and seafood (Filonzi et al. 2010(Filonzi et al. , 2021, identification of threatened animal species used in wildlife criminal trade (Staats et al. 2016) and to discover the biodiversity level between and within different animal taxa (Bezeng and Bank 2019). This approach is also valuable to identify species displaying low interspecific divergence, such as the ones belonging to the megadiverse neotropical freshwater fish fauna (Pereira et al. 2013).
Claudio Ferrari and Erica Tovela are contributed equally to this work.
This paper belongs to a Topical Collection originated from a long scientific collaboration between Mozambican and Italian universities promoted by the Italian Agency for Development Cooperation.

3
The application of genetic studies for the identification of African freshwater fish is an urgent need considering the potential richness of species and the paucity of available data in the field of taxonomy, population and conservation genetics (Pereira et al. 2013). Focusing the attention on specific African countries, most of genetic studies have examined aquaculture stocks or introduced populations instead of studying the natural ones (Mojekwu et al. 2021). That is the case of Mozambique, where freshwater fish biodiversity has been rarely investigated in the past and many gaps are still waiting to be filled out. For this reason, in this work we selected a dual approach investigating two different fish taxa of Mozambique dealing with issues either in the commercial field or in natural sciences: respectively species belonging to the Cichlidae and the Amphilidae families.
Among the Cichlidae family, Tilapia is one of the most complex genera with three subgenera taxonomically distinguished mainly by reproductive and feeding behavior: Oreochromis (female mouthbrooders), Tilapia (biparental and substrate spawners), and Sarotherodon (biparental mouthbrooders) (Yáñez et al. 2020). The evolution history of these taxa is still unresolved. In fact, a taxonomic approach based on behaviour and ecological habits certainly underestimates the complexity and genetic diversity within the taxon. Despite this, genetic manipulation and commercial spreading of tilapiines is increasing (Yáñez et al. 2020). In fact, they are becoming invasive alien species worldwide and knowing their genetic profiles is becoming fundamental (Cassemiro et al. 2018).
In Mozambique, five species of Tilapia are nowadays described, namely Oreochromis mossambicus, Oreochromis aureous, Coptodon rendalli, Oreochromis mortimeri and Oreochromis placidus. O. mossambicus is the most cultivated and studied species worldwide, because of its ability to adapt to various habitats and environmental fluctuations in terms of salinity, temperature and dissolved oxygen (Russell et al. 2012). However, the lack of information and mapping of native species of cichlids in water systems of Mozambique generated several conservation issues also referred to the introduction of exotic tilapines into the natural environment, which may harm native endemic fish. In fact, aquaculture has been growing significantly in recent years in Mozambique, because of the government call to increase family income and to reduce pressure from fishing at sea. At the same time, different river and stream systems remain poorly explored and deserve more attention to be dedicated to conservation issues. In fact, recent results obtained in several fish species from mountain streams in Southern Africa have shown high diversity at species level and taxonomic conflicts (Chakona et al. 2018;Schmidt et al. 2015,). For this reason, in this work further attention has been dedicated to the understudied genus Amphlilius, which is a cryptic species, one of the smallest of Siluriformes inhabiting a still circumscribed geographic range (Chakona et al. 2018). Four new species of Amphilius were recently described by Mazungula and Chakona (2021) and evidence of the existence of other genetic lineages of the Amphilidae family has been supposed. The opportunity of studying such species in pristine areas of Mozambique could also provide insights on the patterns of endemism of African stream fishes.
Based on the above considerations, the opportunity of studying samples from museum collections coupled to natural resources by means of a molecular genetic approach is presented. In particular, a molecular characterization of Cichlidae and Amphilidae families sampled in Southern Mozambique rivers and in the Natural History Museum of Maputo was carried out to define the specific taxonomy through a double approach in relation to the DNA quality. In fact, samples from museum collections and pristine natural sites often display degraded DNA (Whitfield 1999;Hajibabaei et al 2006). For this reason, both COI barcoding and mini-barcoding were applied to fish samples to fill knowledge gaps and complete the species identification (Sultana et al. 2018).

Sampling area
Forty-one freshwater fish samples were collected in 2018 and 2019 by traditional methods using nets and fishermen activities in four different rivers of Southern Mozambique ( Fig. 1 and Table 1). In addition, 53 specimens belonging to the Natural History Museum (NHM) of Maputo were also obtained and analyzed.

Sampling and extraction
For each individual, a small fragment of caudal fin was clipped and stored in 95% ethanol. Genomic DNA was extracted and purified by Wizard ® Genomic DNA Purification Kit (Promega) and DNA quality was visually inspected by 1% agarose gel electrophoresis in TAE buffer before amplification of Cytochrome Oxidase I (COI) mitochondrial gene.

PCR amplification and sequencing
A fragment of COI mtDNA gene was amplified using the standard primers FF2d and FR1d for the regular barcoding described by Ivanova et al. (2007). Polymerase chain reaction (PCR) amplification was performed following Filonzi et al. (2010) in a final volume of 25 μl consisting of 30 ng of genomic DNA, 1 U of Biotherm Red DNA Polymerase (Fisher Molecular Biology), Mg2 + 2.5 mM, 10 pmol of each primer, 1 × reaction buffer and dNTPs 0.2 mM. PCR cycles were conducted with an initial 5 min denaturation step at 94 °C, 35 cycles of 30 s at 94 °C, 45 s at 52 °C and 1 min at 72 °C, final extension at 72 °C for 10 min.
Alternatively, a shorter fragment of COI mtDNA was amplied using the mini-barcode universal primers for fish species described in Sultana et al. (2018): Fish_miniFW: ATC ACA AAG ACA TTG GCA CCCT and Fish_miniRV: AAT GAA GGG GGG AGG AGT CAGA. A 295 bp fragment of the mitochondrial COI gene was amplified through PCR amplification using Bio-Rad T100 Thermal Cycler. The cycling conditions were 95 °C for 10 min, followed by 34 cycles of 95 °C for 45 s, 57 °C for 45 s, 72 °C for 45 s and a final step at 72 °C for 10 min. COI amplicons were separated by 2.5% agarose gel electrophoresis followed by ethidium bromide staining and purified by elution following the Promega Wizard Purification Kit protocol. They were sequenced in outsourcing at Macrogen Korea using both COI forward and reverse primers.
To compare results about 20% of samples were randomly repeated using CEQTM DTCS-Quick Start Kit (Beckman Coulter) and analyzed by means of capillary electrophoresis using an automated CEQ8000 DNA Analysis System (Beckman Coulter). Sequences were manually edited for  1 3 sequential errors using MegaX software (Kumar et al. 2018) then they were aligned using ClustalW algorithm (Thompson et al. 1994). Species identity was determined based on a trimmed equal length by comparing each sequence in GenBank using the Basic Local Alignment Search Tool (BLAST) function. A similarity cutoff of ≥ 97% was used for species level identification for sequences submitted to the GenBank database.

Statistical analysis
After the alignment, new haplotypes were checked with all available sequences of Amphilius, Coptodon and Oreochromis in Genbank and analyzed with TCS software version 1.21 (Clement et al. 2000). A Maximum Likelihood phylogenetic tree was obtained from the COI mtDNA sequences using Phylip software 3.0 (Guindon et al. 2010). Best substitution model was selected by Akaike Information Criterion (AIC) and branch support was performed by aLTR (Anisimova and Gascuel 2006). The phylogenetic tree was visualized by FigTree software (version 1.6).

Results
A final number of fifty-seven samples were successfully amplified for mtDNA barcoding out of 67 initially selected. In particular, 16 COI sequences of Amphilius from the museum collection and 41 of Tilapia (Table 1) were analyzed. Only 16 museum samples (3.0%) were successfully amplified with the COI barcoding from Ivanova et al. (2007) due to DNA degradation and fragmentation. Similarly, some of Tilapia samples were discharged due to DNA fragmentation as a consequence of very high temperatures during the field campaigns and no ice available.
Four different genera emerged from analysed sequences: Amphilius, Coptodon, Oreochromis and Tilapia (accession numbers from GenBank and taxonomic data are listed in Table 2). The final consensus length of all barcoded sequences was 684 bp. No stop codons, insertions o delections were observed. The frequency analysis of the nucleotide pairwise comparison showed that 426 of 684 (62.28%) sites were conserved in all samples, and the percentage of variable sites was 194 over 684 (28.36%). Informative parsimony sites were 171 (25%), and 23 (0.03%) singleton sites were present. The most representative tree model based on nucleotide substitutions was the HKY85 (Hasegawa-Kishino-Yano, 85). Twenty new haplotypes were described for the first time: 12 for genus Orechromis, 2 for Tilapia, and 2 for Coptodon among cichlids; 4 for  (Fig. 2) and Amphilius genera (Fig. 3) from GenBank. As the best model of nucleotides substitution the GTR by AIC method was selected.
All the sequences from museum collection were subsequently amplified with COI mini-barcode and 30 samples out of 53 (56.6%) were successfully amplified for 198 bp (Table 3).

Discussion
The barcoding approach based on analysis of COI fragments is a useful methodology to study the taxonomy of freshwater fish and define new genetic lineages (Iyiola et al. 2018). The identification of native freshwater fish is necessary to delineate the evolutionary history of these species and detect new mitochondrial variants enriching the molecular diversity dataset of Southern Mozambique rivers. In this work a high number of newly detected sequences has been described for the first time revealing a widespread underestimation of the biodiversity of Cichlidae and Amphiliidae in this African country. Despite the limited number of analyzed sequences, due to sampling difficulties during field trips, a large number of new haplotypes witnesses the still existing gap of knowledge regarding freshwater fishes of Mozambique.
Although based on just one mitochondrial marker, the results highlight a large amount of molecular diversity in agreement with a recently published paper by Mazungula and Chakona (2021). These two authors discovered four new species of Amphilius in a nearby sampling site in Southern Mozambique. Despite the geographical closeness and similar habitats our experimental sequences of Amphilius clustered in a different node separated from the main cluster of A. uranoscopus described by Mazungula and Chakona (2021). However, whether or not this is a new species or an ecotype will have to be fully demonstrated in the future with further investigations. In any case, results highlight how understudied this genus is and its still undefined geographical range. It can be hypothesized that this taxonomic group in Mozambique more properly belongs to a species complex still unresolved. Nowadays, the coupling of traditional taxonomy based on morphological characteristic and molecular methods with the aim to detect divergent lineages witnessing different evolutionary histories is fundamental (Chakona et al. 2018;Thomson et al. 2015).
DNA barcoding is also an efficient tool for the identification of possible mismatch between species and incorrect morpho-identification. In fact, regarding the Cichlidae family, in this study different unresolved sequences and mismatches were found for Oreochromis sp., Tilapia and Coptodon genera (Fig. 2). More precisely, the genus Tilapia is undergoing revision and sequences of Tilapia guineensis obtained from GenBank cluster inside our group of Coptodon samples. A generalised difficulty in the tilapia taxonomy has already been demostrated by Sonet et al. (2019) that highlighted inconsistent species assigments between Coptodon and different similar genera inhabiting the Congo River due to undescribed species and unknown intraspecific variation. It is also noteworthy observing that increasing of big data to public databases has been associated with the increase of erroneous sequences that could be due to misidentified species and potential errors in sequencing (Shen et al. 2016). Particularly for the tilapia taxon, genetic studies can map existing species in the wild taking into account introduced fish and indicate the extent to which new species pose a danger in hybridization, genetic mutations and the extinction of native taxa (Cassemiro et al. 2018).
Results from mini-barcode DNA analysis has previously revealed the efficiency of this technique to prevent species fraud on commercial markets (Shokralla et al. 2015;Filonzi et al. 2021). Our similar approach also demonstrated the efficiency in the case of museum samples with damaged DNA. In fact, 56.6% of samples were successfully sequenced by mini-barcode versus 3% of reliable results obtained with the method of Ivanova et al. (2007).
In this work, the increasing occurrence of Oreochromis niloticus in natural habitats may be an indicator of how aquaculture is carried out in the country and may impact natural resources and endemic species. In this case, little or no information exist regarding the introduction of this North African species in Mozambique and genetic results open new perspectives in conservation biology of African fish considering the high reproductive potential and ecological plasticity of these invasive species connected to the fast generation of molecular variants.