Towards disentangling the classification of freshwater fish trypanosomes

Currently, new species of freshwater fish trypanosomes, which are economically important parasites, are being described based on subjectively selected features, i.e., their cell morphology and the host species. We have performed detailed phylogenetic and haplotype diversity analyses of all 18S rRNA genes available for freshwater fish trypanosomes, including the newly obtained sequences of Trypanosoma carassii and Trypanosoma danilewskyi. Based on a sequence similarity of 99.5%, we divide these trypanosomes into 15 operational taxonomic units, and propose three nominal scenarios for distinguishing T. carassii and other aquatic trypanosomes. We find evidences for the existence of a low number of freshwater fish trypanosomes, with T. carassii having the widest geographic and host ranges. Our analyses support the existence of an umbrella complex composed of T. carassii and two sister species. Supplementary Information The online version contains supplementary material available at 10.1007/s42995-023-00191-0.


Introduction
Trypanosomes are hemoflagellates that parasitize all groups of vertebrates, including marine and freshwater fishes (Kostygov et al. 2021), among which they are transmitted by blood-sucking leeches (Khan 1976;Letch 1980) or other blood-sucking invertebrates.Symptoms of fish trypanosomiases range from mild anemia to deadly infections with a high parasite burden (Dyková and Lom 1979;Khan 1985).Serious trypanosome infections were previously documented in a range of fish species, yet their economic impact was rather limited (Woo and Poynton 1998).This has changed with the development of a large-scale aquaculture industry, characterized by high density of fish associated with stress and easy transmission of pathogens.Indeed, several recently documented outbreaks of trypanosomiases in fishes (Jesus et al. 2018;Luo et al. 2019;Su et al. 2014) fulfill the criteria of newly emerging diseases with a potential of major economic losses.Largely due to the lack of our knowledge about the biology of these widespread yet so far rather overlooked pathogens, no available against the fish trypanosomiasis.
Using various criteria, over 280 species of trypanosomes have been described from fishes (Chen et al. 2022;Eiras et al. 2012;Gupta and Gupta 2012;Jesus et al. 2018;Lemos et al. 2015;Su et al. 2014;Woo and Poynton 1998), mostly from freshwater hosts.While the traditional taxonomy of fish trypanosomes was based on their morphology and host specificity (Burreson and Pratt 1972;Qadri 1962), it is now generally accepted that these criteria are largely unsuitable for trypanosomes, due to the low number of measurable features, the high variability of these features, pleomorphism, and widely varying host specificity (Jansen et al. 2017;Kostygov et al. 2021).Indeed, Lom (1979) noticed the necessity for a thorough revision of named species of fish trypanosomes, since he considered most of them as synonymous.Just as an example, Trypanosoma mukasai from Haplochromis spp. was suspected to be synonymous with Trypanosoma tobeyi from Clarias angolensis (Baker 1960), which was later confirmed by cross-infection experiments (Negm-Eldin 1998).
Consequently, several attempts have been made to use other criteria to delimit fish trypanosome species, such as agglutinability of surface lectins (Zajíček and Lukeš 1992), activities of selected enzymes (Zajíček and Pecková 1995), size of kinetoplast DNA minicircles (Jirků et al. 1995), and the composition of surface carbohydrates (Feng and Woo 1998).The limits of morphology-based identification, also encountered in the well-studied trypanosomes of humans and other mammals (Hoare 1972), were largely circumvented by the adoption of methods based on conserved gene sequences, primarily 18S rRNA (Gibson et al. 2017) and glyceraldehyde 3-phosphate dehydrogenase (GAPDH) (Hamilton and Stevens 2017).Moreover, cross-infection experiments revealed that some trypanosomes are confined to a single host species (Noyes et al. 2002), while others, such as Trypanosoma cruzi, have been encountered in more than 100 mammalian species (Jansen et al. 2017).
So far, the 18S rRNA-based classification has been applied in numerous protist lineages, e.g., Sessilida (Lu et al. 2023), while in case of fish trypanosomes only to European freshwater fish trypanosomes, revealing their clustering into three clades, the composition of which did not agree with the morphology-based classification (Gibson et al. 2005).While some species fell into a single clade (e.g., Trypanosoma abramidis), others appeared in two (e.g., Trypanosoma tincae) or even all three clades (T.carassii) (Gibson et al. 2005).The latter species is the best studied fish trypanosome (Lom and Dyková 1992) that may occur also on other continents (Zhang et al. 2022).However, due to the lack of comprehensive sequence datasets, the species definition of T. carassii remains unresolved.
Here, we have established the 18S rRNA, GAPDH and internal transcribed spacer 1 (ITS1) sequences of several T. carassii isolates.The information contained in these sequences allowed us not only to re-address the taxonomic conundrum of T. carassii, but also to propose criteria for species definition suitable for freshwater fish trypanosomes.This should facilitate further research of these increasingly important parasites.

18S rRNA-based phylogenetic analysis
First, we performed phylogenetic analysis using the 18S rRNA sequences of freshwater fish trypanosomes, as this gene is widely available and information-rich.Moreover, we generated four clones from a recently isolated Trypanosoma sp.micropteri that were found to be identical with the original sequence (OM397104) (Zhang et al. 2022).The T. carassii isolates described previously (Gibson et al. 2005;Zhang et al. 2022) and other trypanosomes related to T. carassii for which almost complete (> 2 kb) 18S rRNA sequences available were preferentially selected (Table 1).Phylogenetic trees were built using neighbor-joining and maximum likelihood methods.As expected, all freshwater fish trypanosomes group into a single clade, while marine fish trypanosomes form a sister group (Supplementary Fig. S1), similar to a merged tree (Fig. 1, merged from Supplementary Figs.S1-S6).However, within the freshwater clade, sequences could be subdivided into at least seven operational taxonomic units (OTUs).Among these, isolates EL-2, CaC-RA p15, R6, Ts-Cc-Sp and Se, Ts-Tt-HOD constitute OTU A, isolates Cc-NEM, Ts-Se-BL, Ts-AB-TB, LUMP 1243 and T. granulosum (UK) form OTU B, T. danilewskyi (TrCa), MARVp3, Trypanosoma sp.fulvidraco, T. micropteri (T.danilewskyi) and Trypanosoma sp.micropteri cluster into OTU C, and Trypanosoma ophiocephali, T. siniperca and Trypanosoma sp.carpio group together as OTU D. In addition, isolates Cc, CLAR (AJ620555 and OQ130038) and Trypanosoma sp.K_A from leech form OTUs E, I and N, respectively (Supplementary Fig. S1).These OTUs were built based on the genetic distances from MEGA analysis, with grouping criteria of ≤ 0.0040 (Supplementary Fig. S2), while the inter-group genetic distances are much larger, except A-B (0.0035-0.0071) on the edge of merge-or-split (Supplementary Table S1).
To further verify consistency of this grouping, we analyzed the degree of DNA divergence among the abovecharacterized OTUs by DnaSP6.0.The average number of nucleotide differences (K) within each OTU ranged from 1 to 4.6000, while the nucleotide diversity (Pi) ranged from 0.0005 to 0.0023 (Table 2).The corresponding intergroup Kxy and Dxy were much larger (9.1670-89.4000and 0.0045-0.0442)(Table 3).
When gaps are included, the genetic distances of these sequences display a pattern similar to that described above (Supplementary Fig. S2, Table S2).However, the genetic distances are significantly higher, particularly the intragroup distances within OTU C (≤ 0.0093), and exceed the inter-group distances for OTUs A and B (0.0044-0.0088), suggesting a merge of A and B into a single combined OTU.Unfortunately, we were not able to calculate K, Pi, Kxy and Dxy including the gaps.
In order to incorporate all relevant sequence data into the phylogenetic tree, we truncated the 2 kb-long alignment into a 1.4 kb-long version, so that it matched the length of 18S rRNA regions available for several fish trypanosomes in GenBank.Thus, following the integration of Ct-1, E1 and Sl, the above-described intra-group distances were adapted to ≤ 0.0015 (OTU A), ≤ 0.0067 (OTU B), ≤ 0.0051 (OTU C) and ≤ 0.0014 (OTU D).Consequently, the shortest inter-group distances are between OTUs A and B (0.0044-0.0114), and C and E (0.0059-0.0105), on the edge of merge-or-split (Supplementary Fig. S3, Table S3).The truncated distances within the intra-group K range from 0.6670 to 4.0000, and Pi ranges from 0.0005 to 0.0029 (Supplementary Table S4), while the corresponding inter-group Kxy and Dxy are much larger (9.4580-80.0000and 0.0068-0.0578)(Supplementary Table S5).Moreover, when gaps are included, the genetic distances of these sequences display yet another pattern, with maximum intra-group genetic distances present in OTU C (≤ 0.0128), exceeding the inter-group distances of OTUs A and B (0.0058-0.0122) (Supplementary Table S6).Consequently, Ct-1 (KJ601719) stands out as a new OTU O, while E1 (KJ61714) and S1 (KJ61723) form OTUs H and F, respectively, with all three constituting sister groups of OTUs E and C (Supplementary Fig. S3).
We were also able to incorporate all the above sequences into a larger dataset of various lengths and generate a new phylogenetic tree (Fig. 1).We found that all the defined 15 OTUs retain the same stable evolutionary relationships in the new phylogenetic tree, as those based on analyses of partial data (Supplementary Figs.S1-S6).In particular, OTUs A and B are on the edge of merge-or-split, while two groups (J/K/L and M/N) are divergent (Table 1).
Next, we compared the divergence among each 18S rRNA dataset and established their correlation coefficients (Supplementary Fig. S2), which may allow data from various 18S rRNA regions to be correlated.For this, the corresponding mutation steps illustrated (Fig. 2; Supplementary Fig. S7).This approach confirmed that OTUs N and J, representing J/K/L and M/N, respectively, are significantly divergent from the remaining OTUs of the network.Although the ranges of mutation steps among 10 OTUs seem to overlap, the relationships within A/B, C/D/E/O and F/H, respectively, are very close (Fig. 2; Supplementary Fig. S8).

GAPDH-and ITS1-based phylogenetic analyses
To increase the robustness of our phylogenetic study of freshwater fish trypanosomes, we also analyzed the GAPDH gene, which is generally the second marker of choice for reconstructing trypanosome phylogeny (Gibson et al. 2017;Hamilton and Stevens 2017).However, this gene sequence is available from a narrower set of species as compared to the 18S rRNA gene.Still, it allowed us to interrogate the relationships among seven OTUs.Consistent with the results obtained with the 18S rRNA sequences, the GAPDH sequences robustly split fish trypanosomes into a freshwater clade and a marine clade, with OTUs A grouping alone, and OTUs I and G grouping with C (Fig. 3).However, T. granulosum (UK) (AJ620246), which in the 18S rRNA-based tree  groups with OTU B, affiliates in the GAPDH dataset with OTU G.At this point, however, we cannot rule out the possibility of miss-labeling, since some isolates of T. granulosum (e. g., Portugal) firmly group with OTU G (Fig. 3).ITS1 is another useful marker for population structure within a species (Wen et al. 2016), although it has not previously been applied to fish trypanosomes.We generated 14 ITS1 sequences from EL-2 (T.remaki), CaC-RA p15 and Ts-Cc-Sp from OTU A, T. carassii (Cc-NEM) from OTU B, MARV p3 and Trypanosoma sp.micropteri from OTU C, and Trypanosoma sp.(CLAR) from OTU I. Our comprehensive analysis of this marker shows that ITS1 is highly divergent, with variability even within the same DNA sample.Nevertheless, the obtained data are consistent with OTUs A and B being indistinguishable, and OTUs I and C grouping together (Supplementary Fig. S9).

Comparison of morphological parameters
Next, we checked whether the published morphological data of freshwater fish trypanosomes provide any clue as to their taxonomy and phylogeny.To that end, only isolates with known molecular barcode(s) (e.g., 18S rRNA) were selected.In total, only one morphological dataset is available for OTUs A, E, F and H, while four datasets are available for OTU C and three for OTU D (Table 4).For members of OTUs C and D, the following parameters are available in the literature (distances in μm): PK (posterior end to kinetoplast), 0.7-1.7 and 1.3; KN (kinetoplast to nucleus), 8.3-12 and 13.7-18.9;PN (posterior end to nucleus), 8.4-12.7 and 15.6-21.5;NA (nucleus to anterior end), 6.6-9.7 and 8.1-12.4;BL (body length), 15.6-22.4 and 28.1-30.3;FF (free flagellum length), 10.9-15.7 and 11.4-17; L (total length including FF), 26.5-37.7 and 39.5-46.6,BW (body width), 1.2-2.3 and 1.3-1.6,respectively (Table 4; Fig. 4).Z-test analysis on these parameters showed some significant intra-OTU differences, e.g., L, NA, NL.However, a consistent inter-groups difference between OTUs C and D was not observed.

Discussion
Although morphology was for almost a century the guiding principle of trypanosomatid taxonomy in general and fish trypanosomes in particular, for these morphologically highly variable flagellates this approach eventually became even less useful than for other protist groups (Letch 1979;Maslov et al. 2013).However, the application of molecular characters, now widely applied in the field of trypanosome research, turned out to be transformative.Here, we generated new sequences, and analyzed previously available sequence data, with the aim of reevaluating the taxonomy and systematics of freshwater fish trypanosomes that were so far rather neglected in this respect.
To avoid mixed infections (Grybchuk-Ieremenko et al. 2014), DNA was obtained either from clonal populations (Su et al. 2022) or even from single cells (Chen et al. 2022).1.4-3.9 1 3 When trying to analyze as many fish trypanosomes as possible, we encountered the problem of their frequent unavailability for DNA isolation.Another limiting aspect was that for numerous isolates, only short fragments of the 18S rRNA gene are available in GenBank.Indeed, fragments of the 18S rRNA gene shorter than 1.4 kb (Grybchuk-Ieremenko et al. 2014), or lacking the V7 and V8 domains, turned out to be unsuitable for phylogenetic analyses (Supplementary Fig. S2).One way of dealing with this problem is to predict the missing regions, allowing at least limited inferences for the corresponding trypanosome species.
Taking into consideration the varying length of sequences and their sometimes questionable quality, we decided to analyze only sequences with gaps excluded.Based on the 18S rRNA gene, all freshwater fish trypanosomes for which sequences are available fall into the following four major clades: (i) OTU A/B, (ii) OTU C- clade (I, D, G, E,  O, C, F and H), and more distant, (iii) J/K/L, and (iv) M/N.This subdivision revealed that the T. carassii isolates TrCa and MARV (in OTU C) are not closely related to other T. carassii isolates (in OTU A/B), underlining the need for a reevaluation of the taxonomy of these trypanosomes (Fig. 1).
First of all, OTUs J/K/L and M/N clearly represent two distinct species, while the picture is more complex for the other 10 OTUs.The latter can be split into T. carassii (OTUs A/B) and eight distinct species (= scenario 1) or, following a more conservative approach, this group would be composed of T. carassii (OTUs A/B, syn.Trypanosoma Scenario 1 is based on genetic distances among fullsize 18S rRNAs, excluding gaps, with intra-species differences below 0.4% and inter-species differences exceeding 0.6%.Very similar criteria have been applied in the case of trypanosomes of birds (Šlapeta et al. 2016).In the frame of this scenario, we suggest retaining TrCa as T. danilewskyi (syn.Trypanosoma micropteri) (Bienek et al. 2002), with the inclusion of other OTU C members, e. g., the MARV isolate.This would resolve a confusing situation associated with synonymizing T. carassii and T. danilewskyi (Lom and Dyková 1992), and numerous studies on T. carassii which actually used the T. danilewskyi isolates (Hagen et al. 2014;Islam and Woo 1991;Kovacevic et al. 2015;Wang and Belosevic 1994).Furthermore, T. granulosum shall be retained for OTU G, with OTU D representing T. ophiocephali (syn.T. siniperca), and OTUs E, O, I, H and F representing Cc, Ct1, CLAR, E1 and S1, respectively, all qualifying as separate species.
In scenario 2, we applied criteria corresponding to 3% differences in the 300 nt-long hypervariable V7 region (Smit et al. 2020) and 1% differences in the 1.4 kb-long region (Díaz et al. 2020), namely (excluding the gaps criteria of full-length 18S rRNA) intra-species distances less than 0.8%, and inter-species distances over 1.0%.Under these criteria, T. danilewskyi expands to include T. ophiocephali and T. siniperca (OTU D), as well as the unnamed trypanosomes from OTUs E and O, with T. granulosum restricted to isolate Portugal (OTU G), while OTUs F/H and I appear to be very closely related to African freshwater fish trypanosomes (only ~ 300 bp sequences available), which is potentially T. mukasai (Smit et al. 2020).
Finally, scenario 3 operates with excluded gaps, intraspecies distances below 1.8% and inter-species distances above 4.0%.Such genetic distances are similar to those among members of the Trypanosoma cruzi complex, known to be highly divergent (Zingales et al. 2009), and Trypanosoma rangeli (Stevens et al. 1999).If such relaxed parameters were applied, all freshwater fish trypanosomes (10 OTUs) would belong to a single species-T.carassii.
Such a view finds some support in the highly divergent and thus problematic ITS1 data (Supplementary Fig. S9).Indeed, the differences in ITS1 among T. cruzi strains and among freshwater fish trypanosomes are very similar (Lima et al. 2015;Marcili et al. 2009).
Moreover, documented host ranges also support the last scenario.The overlap among OTUs A, B and C in the orders of fish hosts suggests the potential of different hemoflagellate isolates to infect the same fish species (Supplementary Fig. S10; Table 1) and, at the same time, the broad range of hosts firmly excludes the one-host onetrypanosome species paradigm.
This conclusion finds additional support in the morphology-based literature, which reports significant variability in the size and shape of these protists in the course of their life cycle, as well as depending on the intensity of the infection (Becker 1967;Gupta et al. 2006;Letch 1979;Mackerras and Mackerras 1961;Qadri 1962).Moreover, although there is a clear phylogenetic boundary between marine and freshwater fish trypanosomes, they may overlap in the estuarine environment, as was shown by artificial infections of the euryhaline tilapia (Chen et al. 2022).Therefore, the habitat-specific phylogeny may reflect an independent evolution of invertebrate vectors, such as leeches.
In conclusion, by interpreting all sequence data available for freshwater fish trypanosomes, we provide three alternative scenarios for their taxonomy.While species definition remains, unsurprisingly, ambiguous, we find best support for the existence of three robust species of freshwater fish trypanosomes, namely the umbrella complex of Trypanosoma carassii, Trypanosoma abeli (OTUs J/K/L) and yet-to-be named Trypanosoma sp.(OTUs M/N).

Phylogenetic analysis
The obtained sequences were compared with the publicly available sequences using BLAST (http:// www.ncbi.nlm.nih.gov/ blast/).The 18S rRNA sequences, accession numbers and others information (Table 1) for freshwater fish trypanosomes were obtained from the GenBank database.Sequences were aligned by Clustal X (Thompson et al. 1997) using default settings and with final manual adjustments.The neighbor-joining and maximum likelihood methods were used to create phylogenetic trees by MEGA VII (Kumar et al. 2016) under Kimura's two-parameter model with gamma distributed and invariant sites (G + I), pairwise deletion for gaps and bootstrap of 1000 replicates.
Genetic distances between the newly obtained and available 18S rRNAs of freshwater fish trypanosomes were calculated by MEGA-VII and BLAST + 2.8.1 using the following formula: genetic distances = 1 − the number of bases that can be paired between two sequences/the part of the alignment of the two sequences (1 − sequence identity).
DnaSP v6 software was used to calculate the average number of nucleotide differences (Pi) within a population (K), the average number of nucleotide differences among populations (Kxy) and the degree of nucleotide ambiguity (Dxy, p-distance).Significance analysis of morphological data was performed with Z-test (Zhang et al. 2022).
Parsimony (TCS) and neighbor networks of haplotype diversity were generated using DnaSP, PopArt and Split-sTree software, with missing data replaced in rotation by corresponding sites from full-length (> 2 kb) operational taxonomic units (OTUs) (Supplementary Fig. S11).

Fig. 1
Fig. 1 Phylogenetic analysis of 18S rRNA gene sequences from trypanosomes.18S rRNA-based neighbor-joining phylogenetic tree of trypanosomes, bar represents 0.05 substitutions per site.Other trypanosomes (outgroup) are those of the marine clade and avian/mammalian species

Fig. 2
Fig. 2 Parsimony network (TCS) of haplotype diversity based on 18S rRNA gene.The size of the circles is proportional to haplotype frequency and the colors represent the OTUs to which they belong.The mutation steps between each OTU pair are indicated

Fig. 4
Fig. 4 Morphological parameters of trypanosomes from different OTUs.PK posterior end to kinetoplast; KN kinetoplast to nucleus center; PN posterior end to nucleus center; NA nucleus center to anterior end; BL body length; FF free flagellum length; L total length

Table 1
Summarized information for trypanosomes used in this study T. granulosum Portugal T. granulosumLaveran  & Mesnil, 1909

Table 4
Morphometrics of freshwater fish trypanosomesBiometric data (center to center distances across the cell axis) in μm are provided as mean ± SD and ranges: PK posterior end to kinetoplast; KN kinetoplast to nucleus; PN posterior end to nucleus; NA nucleus to anterior end; BL body length; FF free flagellum; L total length; NL nucleus length; NW nucleus width; BW body width; NI nucleus index (PN/NA); KI kinetoplast index (PN/KN); FI flagellum index (FF/BL).#, two sets data were found from the published data.*, recalculated with the provided means of biometric data according to the definition