Genetic diversity and population genetic structure analysis of Apis mellifera subspecies in Algeria and Europe based on complementary sex determiner (CSD) gene

In honeybees, the mechanism of sex determination depends on genetic variation at the complementary sex determiner (CSD) locus, which has a large allelic diversity. In this study, we examined the population genetic structure and genetic diversity within the highly variable region (HVR) of CSD in five Apis mellifera subspecies, in addition to Buckfast and unknown mixed ancestry bees. We sequenced CSD in 329 drones, 146 from Algeria (A. m. intermissa and A. m. sahariensis subspecies) and 183 from Europe (A. m. ligustica, A. m. carnica, A. m. mellifera subspecies, Buckfast samples, and individuals of unknown mixed ancestry). A total of 119 nucleotide haplotypes were detected. These corresponded to 119 protein haplotypes, of which 81 were new. The analysis of these haplotypes showed that HVR diversity levels were comparable with those in other populations of honeybee worldwide. Paradoxically, this high level of diversity at the locus did not allow for a separation of the samples according to their subspecies origin, which suggested either an evolutionary convergence or a conservation of alleles across subspecies, and an absence of genetic drift. Our results can be used to provide more information about the CSD diversity to include in breeding programs of honeybee populations.


INTRODUCTION
Hymenoptera is one of 11 orders of holometabolous insects, which includes 200,000 species of sawflies, wasps, ants, and honeybees (Wilson and Holldobler 2005). Hymenoptera lack sex chromosomes, and sex is usually determined by a single locus. These species are characterized by the principle of haplodiploid sex determination system found in 12% of all animal species (Beye et al. 2003). In many haplodiploid hymenopteran species, the molecular mechanism underlying female development depends on heterozygosity at the complementary sex determiner (CSD) locus (Heimpel and de Boer 2008). In honeybees, females (either queens or workers) develop from diploid embryos (two chromosome sets) and are heterozygous for the CSD locus, whereas individuals homozygous for this locus give diploid males. The colony destroys any occurring diploid male during the early development stages, whereas males (drones) developed from unfertilized haploid embryos (one chromosome set) survive (Woyke 1963). CSD which is located in chromosome 3 is 11,734 bp long and consists of 9 exons (Kaskinova and Nikolenko 2017). According to Hasselmann and Beye (2004), these 9 exons are distributed in three regions: 1 to 3, 4 to 5, and 6 to 9 and are separated by two prolonged introns. The polymorphism level in the CSD coding regions is seven times higher than in the noncoding regions (Beye et al. 2003). The genomic region from exons 6 to 9 has the highest polymorphism level compared to the other two regions (Hasselmann and Beye 2004). Nucleotide polymorphisms accumulate within a confined part of the CSD gene, characterized as the Potential Specifying Domain (PSD) (Hasselmann et al. 2008). The PSD region which includes a hypervariable region (HVR) is located on exon 7 which consists of a variable number of repeats, mainly containing A/T rich motifs, and encodes large tyrosine (Y) and asparagine (N) residues. It is flanked by the domains encoding the arginine/serine and proline-rich regions, both of which likely take part in protein-protein interactions. The gene encodes an SR-type protein and is a potential splicing factor (Beye et al. 2003). The CSD locus variability and the allelic composition can be done by only amplifying and sequencing exons 6, 7, and 8 (Lechner et al. 2014).
The CSD gene has evolved by a gene duplication event from an ancestral copy of its paralog gene feminizer (FEM) within the honeybee Apis genus. Heterozygosity at CSD is required to induce the female pathway by interacting with transformer 2 (a protein that contains the RNAbinding domain) leading to a female-spliced FEM transcript. This creates the protein product that is responsible for the female-specific splicing of the Am-DSX transcript (Gempe et al. 2009). Homozygous or hemizygous CSD induces the male pathway, mediated by a truncated FEM protein, which results from an early stop codon in the male FEM mRNA. Consequently, the translation is terminated prematurely, and Am-DSX is spliced in a male-specific manner, producing a protein with a male-specific carboxy-terminal end (Nissen et al. 2012;Biewer et al. 2015;Hasselmann et al. 2010).
Low allelic diversity for CSD leads to a loss of colony strength due to the emergence of nonviable diploid drones; therefore, the identification of alleles is an important task for breeding programs, particularly for those employing artificial insemination of (Kaskinova et al. 2019). The CSD gene in honeybees was first found as having 20 alleles (Beye et al. 2003). Later, Lechner et al. (2014) raised this number to 87 and extrapolated to 116-145 existing alleles. In 2017, 121 different CSD alleles were identified within two Polish honeybee populations (Zareba et al. 2017). In 2020, Bilodeau et al. identified 83 and 62 alleles in two studied populations. CSD evaluation in breeding programs in New Zealand (Hyink et al. 2013), Russia (Kaskinova et al. 2019), and the USA (Bilodeau et al. 2020) found HVR diversity levels comparable to those found in populations of honeybees worldwide (Lechner et al. 2014;Zareba et al. 2017).
In this study, we assessed the haplotypic diversity of the CSD locus, focusing on the HVR of honeybees from breeding apiaries in Europe: A. m. ligustica, A. m. carnica, A. m. mellifera, Buckfast, and unknown population, and from Algeria: A. m. intermissa, A. m. sahariensis, which represents the North Africa. Determination of genetic diversity and phylogenetic analysis were done in order to examine if samples could be separated according to their subspecies and identify the presence or not of haplotypes that had not been previously described. Since some samples are collected from regions never studied before, we expect to find new haplotypes.

Sample collection
Males (drones) (N=381) were collected from Algeria (N=193) and Europe (N=188). Algerian drone adults were collected from 34 different apiaries in 7 different regions. The assignment to subspecies was carried out according to morphological criteria: A. m. intermissa commonly called Tellian bee described by Buttel-Reepen (1906) and A. m. sahariensis successively described by Baldensperger and Haccour (Baldensperger P.J 1932;Haccour P 1960) found in the south of Algeria and Morocco.
A. m. intermissa drones (N = 142) were collected during 2 seasons. In 2017 (late February to early June) from 29 apiaries (N = 88) located in the north west of Algeria (Oran (N = 29), Sidi bel abbes (N = 30), and Mascara (N = 29)). In 2018 (February to July) from 5 apiaries (N = 54) located in the north center and east (Medea (N = 29), Annaba (N = 25) respectively). A. m. sahariensis samples (N = 51) were collected from two apiaries located in the south west of Algeria (Ain Sefra (N = 26) and El Bayadh (N = 25)) ( Fig. 1). All drones sampled from Ain Sefra come from the same hive. A total of 188 drones sampled from Europe at the larval stage used in the present study are part of a larger project on the genome diversity of the major A. m. subspecies commonly found in France (Wragg et al. 2021). These include A. France. In addition, 34 French samples were of unknown genetic background. All the samples were stored in 95% ethanol at − 20 °C until DNA extraction.

DNA extraction method
Prior to DNA extraction, ethanol was totally removed. When the individuals were at an adult stage, we used head and thorax, which was the case for all the Algerian samples. At a nymphal stage, all the body was used for DNA extraction (European samples).
After cutting with a scalpel, the material was placed in tubes containing 490 µL cell lysis buffer (TNES-Urea: 1 M Tris HCl, pH 8, 3 M NaCl, 0.5 M EDTA, 10% SDS). Proteinase K (eurobio GEXPRO01) 10 mg/mL was added in two times: (i) 12.5 µL at 56 °C incubation under constant agitation during 3 h. (ii) 5 µL at 37 °C with an overnight incubation. Precipitation of cell debris was performed by adding 200 µL 3 M NaCl to each sample then stirring the tubes by inversion followed by centrifugation at 12,500 g for 30 min at 4 °C. Each supernatant was carefully transferred to a new tube and 2.5 µL 100 mg/mL RNase were added, and then transferred to a new tube with 100% ethanol to allow DNA precipitation. DNA was pelleted then washed with 70% ethanol, air dried for 5 min, then resuspended in 100 µL TE buffer 10/0.1 (10 mM Tris HCl, 0.1 mM EDTA). When the DNA did not appear after adding 100% ethanol, a centrifugation at 8000 g for 10 min at 4 °C was performed. The supernatant was eliminated and the pellet was resuspended in the same way as described before. DNA solutions were incubated at 37 °C overnight with constant agitation. DNA quality and quantity controls were measured using a NanoDrop 8000 (Thermo Scientific, Wilmington, DE, USA) spectrophotometer and the PicoGreen fluorescence assay. All DNA were submitted to an electrophoresis in 0.8% agarose gel.
A final volume of 25 μL amplified sample consisted of 50 ng DNA [10 ng/µL], 0.5 U Taq™ (Promega ref M8295), 1 × Buffer, 0.2 mmol/L dNTP, 1.5 mmol/L MgCl 2 , 0.4 mmol/L of each primer and ultrapure water. PCR amplification initiated by 5 min initial denaturation at 94 °C, followed by 30 cycles of 30 s denaturation at 94 °C, 30 s annealing at 58 °C, and 30 s elongation at 72 °C, ultimately completed by 20 min terminal elongation at 72 °C. Positive (DNA) and negative (water) control were systematically used in each PCR run. Five microliters of each PCR product was loaded on 1.5% agarose gel. Whenever no PCR product was observed with the primer set 1, an attempt was done with set 2. If unsuccessful, the sample was removed from the study.

Sanger sequencing
The PCR products were sequenced in both directions by an automated Sanger sequencer (ThermoFisher ABI3730). According to the intensity of the band observed on agarose gel, 3 to 8 µL of PCR product was digested with 10 U exonuclease I (Promega M9910) and 0.5 U Shrimp Alkaline Phosphatase (OzymeM0293L) for 45 min at 37 °C. After enzyme inactivation (incubation for 30 min at 80 °C), the sequencing reaction was immediately carried out as follows: 5 × sequencing buffer, BigDye Terminator Cycle Sequencing mix v 3.1, and 5 uM CSD primer (1 cycle of 5 min denaturation at 95 °C, followed by 25 cycles elongation of 5 min denaturation at 95 °C, 15 s at 55 °C, and 4 min at 60 °C). Excess fluorescent ddNTP was removed by sephadex G50 (GE Healthcare) gel filtration chromatography and centrifugation at 755 g for 3 min at 4 °C without break. The purified sequencing products were subjected to capillary electrophoresis (ThermoFisher ABI3730).

Data analysis
The agreement between forward and reverse sequence chromatograms was checked by visualization with Chromas 2.6 (Technelysium Pty Ltd, Brisbane, Australia) and the sequences were exported as FASTA files. The sequence alignment was done with Clustal Omega 1.2.4 with default parameters (Madeira et al. 2019). EMBOSS-Transeq was used to translate DNA sequences to protein in silico (Madeira et al. 2019). Among the 6 reading frames, we selected the one giving the conserved amino acid sequence IEQIP situated after the hypervariable region as described by Lechner et al. (2014). Alignment visualization and edition were done with Jalview 2.10.5 using the default alignment parameters (Waterhouse et al. 2009).

Genetic diversity analysis
Haploid DNA sequence data were analyzed in DnaSP (DNA Sequence Polymorphism) 6.12.01 (Rozas et al. 2017) for diversity measurements: haplotype diversity (Hd), nucleotide diversity (π), average number of nucleotide differences between haplotypes (k), number of variable sites (S), number of haplotypes excluding and including indel sites h1, h2 respectively, number of nonsynonymous substitutions per non-synonymous site (Ka), and number of synonymous substitutions per synonymous site (Ks).

Phylogenetic analysis
Molecular phylogenetic analyses were conducted using MEGA X: Molecular Evolutionary Genetics Analysis across computing platforms (Kumar et al. 2018) to perform the maximum likelihood (ML) tree searches and bootstrap estimation of node support. We used the best-fit substitution model application implemented in MEGA X to compare the available amino acid substitution models and obtain the best description of the substitution pattern by maximum likelihood. Using this approach, the GTR + G + I (General Time Reversible, (+ G) non-uniformity of evolutionary rates among sites may be modeled by using a discrete Gamma distribution, (+ I) assuming that a certain fraction of sites is evolutionarily invariable) model was assigned to be the best-fit evolutionary model for the data set of this present study by MEGA X and tree was inferred from evolutionary distances by maximum likelihood. Trust in nodes was estimated by 1000 bootstrap repeats. The median joining network (MJN) analysis of the haplotypes based on the variable characters of the complete alignment was constructed to infer evolutionary relationships between the Algerian and European haplotypes of this study using the NETWORK 10 software (Bandelt, Forster, and Rohl 1999). The principal component analysis (PCA) was carried out by RStudio v1.4 (R Core Team 2018) using the following packages: the Adegenet 2.1.3 (Jombart and Ahmed 2011), the Ape 5.5 (Paradis and Schliep 2019), and Ade4 1.7-17 (Bougeard and Dray 2018). In this analysis, two outlier samples (Buckfast_ITSAP5 and Sahariensis_Bayadh26) were removed to improve the resolution of the figure.

RESULTS
Out of 381 samples tested, 47 from Algeria and 5 from Europe could not amplify, as evaluated by agarose gel electrophoresis, with either of the two primer sets tested (Supplementary  Table I). Consequently, we obtained 329 amplified DNA samples from Algeria (N = 146) and Europe (N = 183). For some individuals, the length polymorphism of the HVR could be visually observed on agarose gel ( Fig. 2A). We obtained complete sequence coverage and agreement between forward and reverse reactions from the start of intron 6 to the end of intron 7 for all remaining 329 samples: 146 and 183 from Algerian and European populations respectively ( Fig. 2B; Supplementary Fig. 1; Supplementary  Table II).

Nucleotide polymorphism level
Analysis of exon 7 and introns 6 and 7 with DnaSP showed that the length of the nucleotide sequences for the 329 samples ranges between 351 and 370 pb. Due to the different biological mechanisms leading to the generation of SNPs and Indels, to the fact that ambiguities in the placement of indels can influence the count of SNPs, and to the fact that most analysis softwares do not take indels into account, we counted haplotypes with indels, but also gave a more conservative number without indels. Including indel sites, we identified 124 DNA haplotypes: 50 and 72 found in Algerian and European subspecies respectively; and 2 in common between both populations. Excluding indel sites, we identified 108 DNA haplotypes (Table I).
When restricting our analysis to exon 7, the length of the sequences ranged between 211 and 301pb. Including indel sites, we identified 119 different DNA haplotypes: 46 in Algerian subspecies while 69 others were found in European ones. And four were common to both Algerian and European subspecies. Excluding indel sites, we identified 104 DNA haplotypes: 43 and 55 were found in Algerian and European subspecies respectively; and 6 were common to both of them (Table I).
Population genetic diversity indices were calculated using nucleotide data for exon 7. The genetic diversity indices (S, Hd, π, and K) were calculated without including indels except indice h1 (number of haplotypes including indel sites) ( Table I). As shown in Table I, the CSD gene has a high level of polymorphism in all A. m. subspecies. A difference of 15 haplotypes was found when comparing the number of haplotype including indels (h1 = 119) and excluding indels (h2 = 104). The number of segregating (polymorphic) sites was 79, in which tri-allelic (N = 20) and tetra-allelic SNPs (N = 5) were detected. The average number of nucleotide differences (k) varied from 14.04 (for A. m. carnica) to 24.50 (for Buckfast). The haplotype diversity level (Hd) for the 329 sequences was 0.984 confirming the high 1 3 genetic diversity characteristic, and was substantially similar between Algerian and European subspecies. It varied from 0.823 (A. m. sahariensis) to 1 (Buckfast).
In order to compare the variability found in introns 6 and 7 and to compare it to exon 7, the nucleotide diversity (π) was calculated separately for the three regions and was found to be 0.034, 0.091, and 0.026 for intron 6, exon 7, and intron 7 respectively. Thus, exon 7 showed at least a threefold greater diversity than that of its neighboring introns ( Fig. 3; Table I; Supplementary  Table III), consequently confirming the previous observations which showed that the polymorphism level in the CSD coding regions is much higher than in the noncoding regions (Beye et al. 2003).
We compared our haplotypes with the 854 CSD nucleotide sequences available in the Gen-Bank database for the same region. The analysis of exon 7 and introns 6 and 7 shows that 101 of the haplotypes identified here were new: 50 specific to each of the Algerian and European subspecies and one shared by both. The analysis of the exon 7 sequences revealed 85 haplotypes that had not been previously described: 45 were  found only in Algeria, and 39 in Europe. One haplotype was common to the Algerian and European subspecies.

Amino acid polymorphism level
The 329 exon 7 nucleotide sequences were translated into amino acid sequences, giving peptides between 70 and 116 amino acids long. The HVR, enriched in N and Y residues and situated between a SLS and a KKL motif, was variable in length, between 10 and 37 amino acids (Supplementary Fig. 2).
The 329 amino acid sequences were assigned to 119 different amino acid haplotypes, among 46 were found in Algeria, and 69 in Europe. Four haplotypes were shared between Algerian and European populations with different frequencies (Supplementary Table IV). Among all 119 haplotypes, 48 haplotypes were present only once in each individual: A. m. intermissa = 18, A. m. sahariensis = 3, A. m. carnica = 8, A. m. mellifera = 3, A. m. ligustica = 9, Buckfast = 2, unknown = 5. Forty-four haplotypes could be found in more than one individual from the same population. However, 27 haplotypes were shared between individuals from different populations.

3
We compared the obtained amino acid sequences with 650 CSD sequences available in GenBank database. Among the 119 haplotypes, 81 haplotypes were newly described, of which 45 found only in Algeria, 36 only in Europe. The remaining 38 haplotypes were identical to sequences previously found in GenBank. The set of new nucleotide sequences identified in this study have been deposited to GenBank (www. ncbi. nlm. nih. gov/ genba nk) nucleotide sequence database and have been assigned the accession numbers (MZ674091-MZ674175).
In exon 7, the number of non-synonymous substitutions (N = 80) was much higher than the number of synonymous substitutions (N = 27). Testing for positive selection was done by calculating the Ka/Ks ratio. The number of nonsynonymous substitutions per non-synonymous site (Ka) was equal to 0.10, whereas the number of synonymous substitutions per synonymous site (Ks) was equal to 0.09, giving a ratio of 1.1, indicative of positive selection.

Phylogenetic and network analysis
As part of our investigation to check if the relationships between the 119 protein CSD haplotypes identified in the Algerian and European subspecies follow the known genetic structure of the populations studied, we performed a maximum likelihood phylogenetic tree, a median Figure 5. Median-joining network and distribution of unique CSD nucleotide haplotypes for exon 7. Each circle represents one unique sequence in the data set. Circle size is proportional to the total number of individuals with the haplotype. Colors correspond to each one of subspecies. joining network, and a principal component analysis (PCA). All three analyses showed that the haplotypes were not grouped according to their A. m. subspecies origin, but that these genetic origins were well mixed with each other (Figs. 4,  5, 6).

DISCUSSION
Previous studies have shown that the CSD gene in A. mellifera, A. cerana, and A. dorsata has a very high level of polymorphism in these three honeybee species (Hasselmann and Beye 2004;Cho et al. 2006;Hasselmann and Beye 2006;Hasselmann et al. 2008;Wang et al. 2012). In this study, we confirmed that the CSD gene also shows a high level of within subspecies polymorphism, and presented the first results for Algerian subspecies.
In the present work, we sequenced 329 honeybees from different subspecies collected from Algeria in addition to five different European countries. Analysis of these sequences showed that genetic diversity indices such as total nucleotide diversity per site and average number of nucleotide differences between any two sequences agreed with values previously observed in other studies (Hasselmann and Beye 2006;Hasselmann et al. 2008;Wang et al. 2012;Lechner et al. 2014;Bilodeau et al. 2020). The estimate of nucleotide diversity was three times higher in exon 7, as compared to introns 6 and 7. Our results reveal that exon 7 of the CSD locus has a much higher genetic diversity than the mean value in the A. m. genome. This is supported by the lower overall genome average nucleotide diversity found in modern samples (π ≅ 0.001) and historical ones (π ≅ 0.003) (Themudo et al. 2020), as compared to the exon 7 of the CSD locus (π ≅ 0.090) studied here ( Table I).
The haplotype diversity of the CSD protein in the populations sampled is not independent on the sampling size. In the case of A. m. sahariensis, we sequenced in one of the locations (the Ain Sefra site) 26 drones from a single hive, which gave us only two haplotypes, corresponding as expected to the two alleles present in the queen's genome. In the remaining 22 A. m. sahariensis drones sequenced, seven more haplotypes were found. For all the other subspecies or populations with more than 20 drones sequenced, the number of haplotypes ranges between 21 and 41 (Table I). This is well within the number of protein haplotypes found in other studies, in which values vary from 16 to over 100 haplotypes (Hyink et al. 2013;Zareba et al. 2017;Kaskinova et al. 2019;Bilodeau et al. 2020). The diversity could represent even more haplotypes if the 52 samples which were not amplified could be analyzed. This was due probably to the high divergence in the flanking regions where the primers were designed. Additional data on flanking sequence for these samples, such as obtained in whole genome sequencing data, for which there is no specific amplification step, would allow designing new primers.

3
The length of the HVR shows considerable variation across populations worldwide. The range of 10 to 37 amino acids for the length of the HVR we found in our study was well within that found for other bee populations (Hasselmann and Beye 2006;Hasselmann et al. 2008;Wang et al. 2012;Hyink et al. 2013;Lechner et al. 2014;Zareba et al. 2017;Kaskinova et al. 2019;Bilodeau et al. 2020). Typically large ranges can be found, such as 6 to 33 amino acids in Kenyan populations (Lechner et al. 2014) or more restricted ones: between 21 and 38 amino acids in a smaller sampling of a closed breeding population (Hyink et al. 2013).
We found a total of 119 nucleotide and amino acid haplotypes in exon 7 (introns excluded), 48 of which being unique and present in a single individual. Conversely, 27 haplotypes were present in sequences from different subspecies, suggesting either evolutionary convergence or the possibility of a high pressure for haplotype conservation across subspecies. A striking fact is that all 119 nucleotide haplotypes correspond to 119 protein haplotypes, meaning that each nucleotide haplotype has at least one non-synonymous mutation.
When comparing our data to the CSD haplotype sequences found in the international databases, forty-five new haplotypes were found only in the 122 samples from Algeria (excluding A. m. sahariensis drones coming from the same hive), which represent 37%, and 36 new European haplotypes from 183 samples, which represent 19%. Therefore, sequencing new subspecies such as A. m. intermissa and A. m. sahariensis contributed to the identification of new haplotypes that had not been studied with European subspecies.
However, although it appears that each subspecies has its own set of specific haplotypes, the phylogenetic tree, the haplotype network, and the PCA analyses of the sequences obtained failed to reveal any structure following the expected subspecies pattern. This is due to the fact that closely related sequences differing by only very few SNPs can be found in very different subspecies. One hypothesis for the presence of identical haplotypes in two completely unrelated subspecies could be an evolutionary convergence, as already suggested (Hasselmann et al. 2008). The other is the rate of positive selection incredibly high as Hasselmann and Beye (2004) showed. The hypothesis of positive selection acting at the CSD locus was confirmed by the ratio of nonsynonymous to synonymous substitutions (dN/ dS = 1.10) (Jeffares et al. 2015) and supported by the number of non-synonymous substitutions (N = 80) that was much higher than the number of synonymous substitutions (N = 27), in addition to the presence of tri-allelic and tetra-allelic SNPs (usually a rare event), coding for three different amino acids.
The sequencing of the HVR-CSD in exon 7 and its flanking introns in Algerian and European populations contributed to the identification of new haplotypes, especially in the Algerian subspecies that was investigated for the first time for CSD. Thus, it is important to continue to include African subspecies that are underrepresented on CSD variability studies, in addition to the fact that Apis mellifera originated in Africa (Whitfield et al. 2006). The high level of diversity at the locus does not allow for a separation of the samples according to their subspecies, suggesting either evolutionary convergence or a conservation of alleles across subspecies since their separation, and an absence of genetic drift. According to our results, this region will certainly be enriched by new haplotypes as new subspecies are analyzed. This study may help inform bee-breeding programs about CSD haplotype diversity in their populations and help make breeding decisions that minimize the loss in brood viability caused by inbreeding.
Italy and the "Association Conservatoire de l'Abeille Noire Bretgonne" (ACANB) for samples from Ouessant (France). Colonsay and Ouessant samples were collected for the SeqApiPop program. We would also like to thank all the beekeepers for allowing access to their apiaries and their disinterested assistance in sample collection. Sequencing was performed in collaboration with the GeT platform, Toulouse (France), a partner of the National Infrastructure France Génomique, thanks to the support by the Commissariat aux Grands Invetissements (ANR-10-INBS-0009). The mention of trade names or commercial products in this publication is solely for the purpose of providing specific information and does not imply recommendation.

AUTHOR CONTRIBUTION
NTA, KT, AV, and FR designed the project, FR provided samples for Algerian population, and BB, KB, and AG provided samples for the European populations. FR, GC, and KT performed the laboratory experiments and data analyses. FR and KT deposed sequences in the database. NTA, KT, AV, GC, and FR discuss the results. FR drafted the manuscript. AV, KT, and NTA revised the manuscript. All authors reviewed and accepted the final draft of the manuscript.

AVAILABILITY OF DATA AND MATERIAL
The set of new nucleotide sequences identified in this study have been deposited to GenBank (www. ncbi. nlm. nih. gov/ genba nk) nucleotide sequence database and have been assigned the accession numbers (MZ674091-MZ674175).

CODE AVAILABILITY
Not applicable.

DECLARATIONS
Ethics approval Not applicable.

Consent to participate Not applicable.
Consent for publication Not applicable.

Conflict of interest
The authors declare no competing interests.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.