Comparative genomics of Leishmania (Mundinia)
Trypanosomatids of the genus Leishmania are parasites of mammals or reptiles transmitted by bloodsucking dipterans. Many species of these flagellates cause important human diseases with clinical symptoms ranging from skin sores to life-threatening damage of visceral organs. The genus Leishmania contains four subgenera: Leishmania, Sauroleishmania, Viannia, and Mundinia. The last subgenus has been established recently and remains understudied, although Mundinia contains human-infecting species. In addition, it is interesting from the evolutionary viewpoint, representing the earliest branch within the genus and possibly with a different type of vector. Here we analyzed the genomes of L. (M.) martiniquensis, L. (M.) enriettii and L. (M.) macropodum to better understand the biology and evolution of these parasites.
All three genomes analyzed were approximately of the same size (~ 30 Mb) and similar to that of L. (Sauroleishmania) tarentolae, but smaller than those of the members of subgenera Leishmania and Viannia, or the genus Endotrypanum (~ 32 Mb). This difference was explained by domination of gene losses over gains and contractions over expansions at the Mundinia node, although only a few of these genes could be identified. The analysis predicts significant changes in the Mundinia cell surface architecture, with the most important ones relating to losses of LPG-modifying side chain galactosyltransferases and arabinosyltransferases, as well as β-amastins. Among other important changes were gene family contractions for the oxygen-sensing adenylate cyclases and FYVE zinc finger-containing proteins.
We suggest that adaptation of Mundinia to different vectors and hosts has led to alternative host-parasite relationships and, thereby, made some proteins redundant. Thus, the evolution of genomes in the genus Leishmania and, in particular, in the subgenus Mundinia was mainly shaped by host (or vector) switches.
KeywordsWhole genome sequencing Leishmania (Mundinia) enriettii L. (M.) macropodum L. (M.) martiniquensis
Phosphatydylinositol glycan class Y protein
Side chain arabinosyltransferase
Side chain galactosyltransferases
Obligate flagellate parasites of the family Trypanosomatidae infect insects, leeches, vertebrates, and plants [1, 2, 3]. They have one (monoxenous species) or two hosts (dixenous species) in their life cycle [4, 5, 6]. Dixenous representatives belong to the genera Endotrypanum, Leishmania, Paraleishmania, Phytomonas, and Trypanosoma and some of them are of medical and/or economic importance [7, 8, 9]. It is generally accepted that all dixenous trypanosomatids have originated from their monoxenous kin . Supporting this, in the current taxonomical system, the dixenous genera Endotrypanum, Leishmania, Paraleishmania are united with the monoxenous genera Borovskyia, Crithidia, Leptomonas, Lotmaria, Novymonas, and Zelonia into the subfamily Leishmaniinae [11, 12], while the dixenous genus Phytomonas is included into subfamily Phytomonadinae along with the monoxenous genera Herpetomonas and Lafontella .
Parasites of the genus Leishmania infect mammals or reptiles and cause various diseases named leishmaniases. For humans, this translates into over 350 million people being at risk of infection primarily in the tropical and subtropical regions . These parasites are transmitted by bloodsucking phlebotomine sand flies (Psychodidae) or, possibly, biting midges (Ceratopogonidae) [15, 16] and manifest the infection by a range of clinical symptoms from innocuous skin lesions to fatal visceral organ failures .
Currently, the following four subgenera are recognized within the genus Leishmania. These are Leishmania (Leishmania), L. (Mundinia), L. (Sauroleishmania), and L. (Viannia) . They are not only well-defined phylogenetically, but can also be delineated by host specificity or clinical picture. The most enigmatic of them is Mundinia , the last established subgenus , which, as of now, contains only four described species: L. enriettii, L. macropodum, L. martiniquensis, and L. orientalis [19, 20, 21, 22]. In addition, there are isolates from Ghana, likely representing a separate species, which is phylogenetically close to L. orientalis .
Leishmania (Mundinia) spp. are of special interest for, at least, four main reasons. Firstly, in this group, human pathogens – L. (M.) orientalis, L. (M.) martiniquensis and parasites from Ghana – are intermingled with species non-pathogenic to humans, namely L. (M.) enriettii and L. (M.) macropodum [20, 23]. Leishmania (M.) enriettii infects guinea pigs in South America [24, 25], while L. (M.) macropodum was found in Australian macropods [26, 27]. In addition, parasites apparently belonging to L. martiniquensis have been also recorded in cows and horses [28, 29, 30]. Secondly, a significant portion of human patients infected with Leishmania (Mundinia) are immunocompromised [31, 32, 33], indicating that these parasites may actively explore new developmental niches [10, 34]. A similar situation has been documented in some thermo-tolerant monoxenous trypanosomatids [35, 36, 37]. Thirdly, Mundinia spp. may be transmitted primarily not by phlebotomine sand flies of the genera Phlebotomus and Lutzomyia as for other leishmaniae, but by biting midges or other genera of sand flies, although more work is needed to confirm this with certainty [15, 38]. Fourthly, and finally, in all phylogenetic reconstructions, L. (Mundinia) represents the earliest branch within the genus Leishmania, suggesting its ancient origin prior to the breakup of Gondwana [2, 39].
For all these reasons, members of the subgenus Mundinia qualify as crucial for comparative genomic analyses, as they may shed light on the evolution of Leishmania and its pathogenicity for humans. Similar analyses have been done and reported for L. (Sauroleishmania) [40, 41], L. (Viannia) [42, 43, 44, 45], L. (Leishmania) [46, 47], leaving Mundinia understudied in this respect.
In this work, we sequenced and analyzed genomes of three Leishmania (Mundinia) species, which represent the major clades of the subgenus: L. (M.) enriettii MCAV/BR/1945/LV90 originating from southern Brazil, L. (M.) macropodum MMAC/AU/2004/AM-2004 originating from northern Australia, and L. (M.) martiniquensis MHOM/MQ/1992/MAR1 originating from the Caribbean island of Martinique. The genomic sequence of L. (M.) enriettii MCAV/BR/1945/LV90 complemented a previously obtained one, which belongs to a different isolate of the same species (MCAV/BR/1995/CUR3) and is available from the TriTryp database.
Origin of isolates, cultivation, amplification, sequencing and species verification
Promastigotes were cultured in the M199 medium (Sigma−Aldrich, St. Louis, MO, United States) containing 10% heat-inactivated fetal bovine calf serum (FBS; Thermo Fisher Scientific, Waltham, MA, United States), supplemented with 1% Basal Medium Eagle vitamins (Sigma−Aldrich), 2% sterile urine and 250 μg/ml of amikacin (Bristol-Myers Squibb, New York, NY, United States).
Total genomic DNA was isolated from 10 ml of trypanosomatid cultures with the DNeasy Blood & Tissue Kit (Qiagen, Hilden, Germany) according to the manufacturer’s instructions. 18S rRNA gene was amplified using primers S762 and S763 , following the previously described protocol . These PCR fragments were sequenced directly at Macrogen Europe (Amsterdam, Netherlands) as described previously . The identity of species under study was confirmed by BLAST analysis .
Whole-genome and whole-transcriptome sequencing and analysis
The genomes and whole transcriptomes of Leishmania (Mundinia) isolates were sequenced as described previously [35, 51, 52] using the Illumina HiSeq and NovaSeq technologies with TruSeq adapters for the libraries preparation, respectively, at Macrogen Inc. (Seoul, South Korea). 43 and 47 million 100 nt paired-end raw reads on average were produced for genomes and transcriptomes, respectively (see statistics below). The genome completeness and annotation quality were assessed using BUSCO software . The raw reads were trimmed with Trimmomatic v. 0.32  with the following settings: ILLUMINACLIP:TruSeq3-PE-2.fa:2:20:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:75, quality-checked with FASTQC program v.0.11.5, and then assembled de novo with the Spades Genome assembler v. 3.10.1 with the default settings and automatic k-mer selection (k-mers of 21, 33 and 55 were used) . The Trinity assembler v. 2.4.0  was used to reconstruct the transcriptomes de novo with the minimal contig length of 150. Resulting genome assemblies were investigated for potential contamination using the BlobTools software implementing Bowtie2  for genome read mapping and Hisat2 for transcriptome read mapping , both with the default settings. Only those read pairs were used where at least one read was present in some contig with the transcriptome read coverage higher than 10 or in a contig with Leishmania, Leptomonas, or Trypanosoma term in first 100 best Diamond hits. Other read pairs were filtered out (Additional file 1: Figure S1, Additional file 2: Figure S2, Additional file 3: Figure S3, Additional file 4: Figure S4, Additional file 5: Figure S5, Additional file 6: Figure S6). Resulting assemblies (CovPlots, Additional file 7: Figure S7, Additional file 8: Figure S8, Additional file 9: Figure S9) were further inspected and curated manually. Parameters of the genome assemblies were estimated using QUAST v. 4.5 . Raw reads were submitted to NCBI SRA under accession numbers SRX5006814, SRX5006815, and SRX5006816 (Bioproject: PRJNA505413) for L. (M.) enriettii MCAV/BR/1945/LV90, L. (M.) macropodum MMAC/AU/2004/AM-2004, and L. (M.) martiniquensis MHOM/MQ/1992/MAR1, respectively.
Genome annotation was performed with the Companion software  using transcriptome evidence, Leishmania major as a reference organism, and pseudochromosome contiguation with default settings. Transcriptome evidence was generated with the Cufflinks, mapping was performed with the Hisat2 with --dta-cufflinks parameter .
Synteny analysis was performed using SyMAP v. 4.2  with the following settings: minimum size of sequence to load, 500 bp; minimum number of anchors required to define a synteny block, 7; synteny blocks were merged in case of overlaps, and only the larger block was kept if two synteny blocks overlapped on a chromosome. In case of Leishmania (Mundinia) genomes sequenced in this study, pseudochromosome level assembly built using Companion software with L. major Friedlin genome as a reference was used for the analysis instead of scaffolds in order to reduce computational time.
Genome coverage analysis and ploidy estimation
Per-base read coverage was calculated for fifty longest scaffolds and all pseudochromosome level sequences using BEDTools v. 2.26.0 genomecov tool  on the read mappings generated with Bowtie2 as described above. Mean genome and scaffold/pseudochromosme coverage was calculated using a custom Python script. Ploidy was estimated based on relative coverage values: mean coverage for each of the fifty longest scaffolds and all psedochoromosome level sequences was divided by mean genome coverage and ploidy was inferred under the assumption that the majority of chromosomes are diploid. Coverage plots for 50 longest scaffolds were generated using weeSAM tool v. 1.5 (http://bioinformatics.cvr.ac.uk/blog/weesam-version-1-5/).
Prior to variant calling, duplicates removal and local re-alignment were performed on the respective read mappings using GATK v. 18.104.22.168 MarkDuplicates and IndelRealigner tools with the following parameter differing from the default: --REMOVE_DUPLICATES = true . Variant calling was performed using Platypus v. 0.1.5  with the default settings and only SNPs were considered in further analyses.
Inference of protein orthologous groups and phylogenomic analyses
Analysis of protein orthologous groups was performed on a dataset containing 41 trypanosomatid species (including four representatives of the subgenus Mundinia, Additional file 16: Table S1) and a eubodonid Bodo saltans as an outgroup, using OrthoFinder v. 1.1.8 with the default settings . Out of a total 551 OGs containing only one protein for each species, 92 were selected for the phylogenomic inference according to the following criteria: i) average percent identity within the group ≥60%; ii) maximum percentage of gaps per sequence in the alignment before trimming – 40%; iii) maximum percentage of gaps per sequence in the alignment after trimming – 10%. The amino acid sequences of each gene were aligned using Muscle v. 3.8.31 . The average percent identity within each OG was calculated using the alistat script from the HMMER package v.3.1 . The alignments were trimmed using trimAl v. 1.4.rev22 with the “-strict” option . The resulting concatenated alignment contained 32,460 columns. The maximum likelihood tree was inferred in IQ-TREE v. 1.6.3 with the JTT + F + I + G4 model and 1000 bootstrap replicates [69, 70]. For the construction of the Bayesian tree PhyloBayes-MPI 1.7b was run for over 9000 iterations under the GTR-CAT model with four discrete gamma categories . Every second tree was sampled and first 25% of them were discarded as “burn-in”. The final tree was visualized using FigTree v.1.4.3 (http://tree.bio.ed.ac.uk/software/figtree/). Gains/losses and expansions/contractions of protein families were analyzed using the COUNT software with Dollo’s and Wagner’s (gain penalty set to 3) parsimony algorithms, respectively . For gene ontology (GO) annotation of gene families gained/lost/expanded/contracted at certain nodes Blast2GO Basic software  was used with the maximum number of BLAST hits set to 10 and other settings left as default. Assignment of KEGG IDs to the proteins of interest was performed via BlastKOALA server with a target database of eukaryotes and prokaryotes at the family and genus levels, respectively . The analysis of OGs shared among Leishmania was performed using UpSetR package .
Analysis of amastin repertoire
Amastin sequences of L. major Friedlin, Trypanosoma brucei TREU927, and Trypanosoma cruzi CL Brener Esmeraldo were downloaded from the TriTrypDB release 41 and used as queries in BLAST search with an E-value threshold of 10− 20 against a database of annotated proteins of Crithidia fasciculata, Endotrypanum monterogeii, Leishmania braziliensis MHOM/BR/75/M2904, Leishmania (Mundinia) spp., Leptomonas pyrrhocoris H10, and Trypanosoma grayi ANR4. The resulting sequences were aligned using Muscle v.3.8.31 with the default parameters . P-distances were calculated using MEGA 7 software , and the hits with p-distance to the α-amastin of T. brucei (Additional file 17: Table S2) exceeding 0.9 and query coverage < 50% were excluded from further analyses. The resulting alignment was trimmed using TrimAl v.1.4.rev22 with the ‘-gappyout’ option . Maximum-likelihood phylogenetic tree was inferred on the final dataset containing 384 sequences and 436 characters using IQ-TREE v.1.5.3 with the VT + F + G4 model and 1000 bootstrap replicates [69, 70].
Analysis of side chain galactosyltransferases
The identification of the side chain galactosyltransferases (SCGs) was performed as described previously . Proteins with p-distances to SCGs of L. major exceeding 0.8 were excluded from further analysis (Additional file 18: Table S3 and Additional file 19: Table S4). Phylogenetic reconstruction was performed using IQ-TREE v.1.5.3 with 1000 bootstrap replicates and VT + F + I + G4 and JTT + F + G4 models for the SCGs and side chain arabinosyltransferases (SCAs), respectively.
Analyses of other proteins within OGs gained/lost at certain nodes
For the identification of putative phosphatydylinositol glycan class Y proteins (PIG-Y), we have performed sensitive homology searches using the HMMER package v.3.1  and a model build using aligned sequences of trypanosomatid annotated as PIG-Y from the TriTrypDB release 41 . Phylogenetic analysis of PIG-Y was performed similarly to amastins, with the JTT + I + G4 model as best-fitting and excluding sequences with p-distances to the reference set higher than 0.8 (Additional file 20: Table S5). The analysis of ferrochelatase sequences was performed similarly (Additional file 21: Table S6), with the JTT + I + G4 phylogenetic model.
Assembly and annotation of three Leishmania (Mundinia) genomes
The three sequenced genomes were assembled and annotated, yielding total lengths of 29.95, 29.59, and 29.83 Mbp for L. (M.) martiniquensis MHOM/MQ/1992/MAR1, L. (M.) macropodum MMAC/AU/2004/AM-2004, and L. (M.) enriettii MCAV/BR/1945/LV90, respectively for the scaffolds longer than 500 bp (Additional file 22: Table S7). The N50 values and largest scaffold sizes varied from 24.17 to 33.45 kbp, and from 181 to 225 kbp for L. (M.) enriettii and L. (M.) martiniquensis, respectively. Genomic reads coverage analysis (Additional file 10: Figure S10) indicates that coverage is fairly uniform across Mundinia genome assemblies, with the regions of coverage close to median values (exceeding 40x but lower than 150x) combined together accounting for ~ 91, 89 and 80% of genome assembly length for L. (M.) martiniquensis, L. (M.) macropodum, and L. (M.) enriettii, respectively. The results of variant calling suggest that the genome of L. (M.) enriettii carrying 12,379 SNPs is characterized by higher variation levels than those of L. (M.) martiniquensis and L. (M.) macropodum with 1765 and 4834 identified SNPs, respectively (Additional file 22: Table S7). The number of homozygous SNPs identified in L. (M.) martiniquensis, L. (M.) macropodum, and L. (M.) enriettii genome assemblies were as low as 64, 67 and 121, respectively, suggesting minimal number of misassembly events (Additional file 22: Table S7).
Expectedly, the results of ploidy analysis suggest that Leishmania (Mundinia) spp. demonstrate variable degree of aneuploidy (Additional file 23: Table S8). In L. (M.) martiniquensis all pseudochromosome level sequences appear to be diploid, except for chromosome 31. The genome of L. (M.) enriettii displays the highest level of aneuploidy among the analyzed species, with nine chromosomes of variable ploidy levels (Additional file 23: Table S8).
Gene gains and losses at the Leishmania (Mundinia) node
The Leishmania (Mundinia) node was heavily dominated by gene losses. There were 13 gained and 234 lost OGs at this node (Fig. 2, Additional file 24: Table S9). All 13 gained and 148 lost OGs contained genes encoding hypothetical proteins. In contrast, the node uniting the three remaining subgenera was dominated by gene gains with 79 gained (71 OGs contained genes encoding hypothetical proteins) and 34 lost (22 OGs contained genes encoding hypothetical proteins) (Fig. 2, Additional file 25: Table S10).
The annotations for sequences within OGs lost at the L. (Mundinia) node indicate changes in the surface architecture of the parasites of this subgenus, exemplified by the losses of putative amastins, glycosylphosphatidylinositol (GPI) anchor biosynthesis and turnover proteins. Amastins are a large family of surface glycoproteins, highly expressed in the amastigote stage of several trypanosomatids, such as T. cruzi and Leishmania spp. . They are essential for establishing infection in macrophages [80, 81] and, therefore, are significantly reduced in lizard-parasitizing L. tarentolae, which cannot efficiently replicate in this type of cells and rarely forms amastigotes .
The results of our gene content evolution analyses suggest that three OGs containing putative amastins were lost at the L. (Mundinia) node (Additional file 24: Table S9). According to the phylogenetic analysis (Additional file 12: Figure S12), two of those OGs – OG0008773 and OG0009479 (Additional file 24: Table S9) – contain putative β-amastin-like proteins, homologues of which were lost in all analyzed Leishmania spp. except for L. major and L. braziliensis, respectively. OG0009537 incorporates γ-amastin-related proteins, identified in the genomes of the monoxenous Leishmaniinae, but lost in all L. (Leishmania) spp. . Overall, 33, 19 and 23 amastin-like sequences were identified in L. (M.) martiniquensis, L. (M.) macropodum, and L. (M.) enriettii, respectively. L. (Mundinia) genomes encode representatives of all four amastin subfamilies, including Leishmania-specific δ-amastins.
The amastin polypeptides are linked to the parasite’s outer membrane via a GPI anchor [83, 84]. Two enzymes involved in GPI-anchor synthesis and GPI-anchored protein turnover, phosphatidylinositol N-acetylglucosaminyltransferase (subunit Y) and glycosylphosphatidylinositol phospholipase-C (GPI-PLC), respectively, also appear to be lost at the L. (Mundinia) node. However, a careful inspection of the results has shown that GPI-PLC is absent not only from Mundinia, but also from other subgenera of Leishmania, as well as from Endotrypanum. The only exception is L. panamensis with a partial sequence of unknown function returning a short hit to the GPI-PLC. This hit resulted in erroneous inference of the putative GPI-PLC presence at the L. (Leishmania) node by the Dollo’s parsimony algorithm. Putative GPI-PLC have been identified in all species within our dataset, except for dixenous Leishmaniinae, C. expoeki, and Phytomonas spp. In trypanosomatids, phosphatidylinositol N-acetyl-glucosaminyl-transferase, the enzyme catalyzing the first step of GPI biosynthesis, is composed of seven proteins: phosphatydyl-inositol glycan class A (PIG-A), PIG-C, PIG-H, PIG-Q, PIG-P, PIG-Y, and dolichyl-phosphate mannosyl-transferase polypeptide 2 (DPM2) . All these proteins were identified in L. (Mundinia), with the exception of DMP2 and PIG-Y being absent from the genome of L. (M.) macropodum. The analysis of orthologous groups revealed that PIG-Y sequences fall into two different OGs, one of which appears to be absent in L. (Mundinia). More sensitive HMM-based searches led to the identification of PIG-Y proteins in several other trypanosomatids. The phylogenetic analysis confirmed the presence of two separate groups of PIG-Y sequences, only one of which contains L. (Mundinia) subunits (Additional file 13: Figure S13). Most of the L. (Leishmania) sequences fall into the latter group, while the representatives of the other clade appear to be in the process of pseudogenization in L. (Leishmania), as suggested by the presence of the identifiable pseudogenes in L. major and L. tarentolae.
We have also analyzed the repertoire of side chain galactosyltransferases (SCGs) and side chain arabinosyltransferases (SCAs), performing chemical modifications of the GPI-anchored lipophosphoglycan (LPG) on the cell surface of the Leishmaniinae [77, 86, 87], with the potential effect on host-parasite interactions [88, 89, 90]. The genome of L. (M.) martiniquensis encodes five SCGs, while those of L. (M.) macropodum and L. (M.) enriettii, sequenced in this study, contain four putative members of SCG/L/R family (Additional file 14: Figure S14). Thus, in L. (Mundinia) the number of SCG-encoding genes is substantially lower than in L. major, L. braziliensis and L. infantum, carrying 14, 17 and 12 genes, respectively. L. (Mundinia) SCG proteins cluster with those of L. braziliensis, and together they form a sister clade to the SCGs of L. major and L. infantum. In addition, L. (Mundinia) spp. contain sequences related to the SCGR1–6, while putative SCGL-encoding genes were not identified, similarly to the situation observed in L. braziliensis [91, 92]. Overall, the SCG/L/R repertoire in L. (Mundinia) is most similar to the one in L. braziliensis, with the exception of the SCG expansion in L. braziliensis, which is not documented in L. (Mundinia). In addition, L. (Mundinia) spp. possess SCA and SCA-like sequences, which are absent in L. braziliensis (Additional file 14. Figure S14).
A few genes encoding metabolic proteins appear to be lost in L. (Mundinia). An important enzyme of folate metabolism is methylene-tetrahydrofolate reductase (MTFR), which converts 5-methyltetrahydrofolate into 5,10-methylene-tetrahydrofolate and is required for the formation of activated C1 units used in the synthesis of both thymidylate by thymidylate synthase/dihydrofolate reductase and of methionine from cysteine by methionine synthase [93, 94]. MTFR is present in Bodo saltans, Paratrypanosoma confusum, Blechomonas alayai, and all Leishmaniinae with the sole exception of L. (Mundinia). In addition to this, it is also absent from trypanosomes and Phytomonas. However, the absence of MTFR does not imply auxotrophy for methionine, since all trypanosomatids seem to be able to synthesize this amino acid by an alternative route using homocysteine S-methyltransferase .
Following the observation that ferrochelatase (FeCH), the terminal enzyme in the heme biosynthetic pathway catalyzing the insertion of iron into protoporphyrin IX , was lost in Leishmania (Additional file 25. Table S10), we have checked the presence of other enzymes of this pathway. Some trypanosomatids (Trypanosoma and Kentomonas), have lost the heme biosynthetic pathway completely, while others retained genes encoding the last three enzymes (Leishmaniinae, Angomonas and Strigomonas), or only ferrochelatase (Phytomonas and Herpetomonas) [97, 98, 99, 100, 101]. Protoporphyrin IX, a substrate of FeCH, is synthesized by a subsequent action of coproporphyrinogen oxidase and protoporphyrinogen oxidase . Both enzymes were readily identifiable in the genomes of L. (Mundinia) spp., except for L. (M.) macropodum. Sequences of FeCH clustered in two separate OGs, only one of which incorporates the proteins of all three L. (Mundinia) spp. (Additional file 15: Figure S15). The other OG contains only the sequences of B. ayalai, E. monterogeiii, Phytomonas spp., and monoxenous representatives of the subfamily Leishmaniinae. The phylogenetic analysis of FeCH (Additional file 15: Figure S15) suggests the presence of two divergent sequences encoding this protein in the genomes of trypanosomatids, which is in agreement with the results of previous studies concluding that there might have been two different FeCH LGT events from bacteria to kinetoplastids . Indeed, the FeCH sequences of C. fasciculata, falling into two different clades, exhibit only ~ 22% identity, giving best BLAST hits outside the Euglenozoa to the γ-proteobacterial sequences.
Kinetoplastids lack the capacity of de novo lysine biosynthesis. However, B. saltans, Leptomonas and Crithidia spp. use the enzyme diaminopimelate epimerase (DAP) to convert diaminopimelate, an amino acid present in the cell walls of gram-negative bacteria, into lysine . In all other trypanosomatids, including L. (Mundinia), DAP has been lost. The loss of genes encoding this enzyme suggests that most of the trypanosomatids have lost their dependency on bacterial diaminopimelate and, thus, are lysine auxotrophs. Interestingly, the genomes of most L. (Leishmania) spp. still possess easily identifiable diaminopimelate epimerase pseudogenes, while no remnants of DAP-encoding genes could be found in other trypanosomatid genomes. This suggests that these genes could have been acquired by the common ancestor of all Leishmaniinae and then independently lost in different lineages of its dixenous descendants.
Gene family expansions and contractions at the Leishmania (Mundinia) node
In L. (Mundinia), 9 gene families were expanded (3 genes encoding hypothetical proteins) and 40 contracted (7 genes encoding hypothetical proteins) (Fig. 2; Additional file 26: Table S11), while in other subgenera, 11 gene families were expanded (4 genes encoding hypothetical proteins) and 7 contracted (3 genes encoding hypothetical proteins) (Fig. 2; Additional file 27: Table S12). The degree of gene family expansion/contraction is rather moderate, with the family size changes involving from 1 to 5 gene copies (Additional file 26: Table S11, Additional file 27: Table S12).
Oxygen-sensing adenylate cyclases (OG0000628) govern O2-dependent cAMP signaling via protein kinase A, and, consequently, cell survival and proliferation of Leishmania promastigotes under low concentration of oxygen . Contraction of this gene family in L. (Mundinia) suggests that these parasites either rely on different mechanisms to deal with hypoxia or are under different environmental cues during development in their vectors.
Another interesting example is a contracted gene family encoding FYVE zinc finger-containing proteins (OG0001095). In eukaryotes, the FYVE domain is responsible for the recruitment of proteins to different organelles such as multivesicular bodies, endosomes, or phagosomes . Membrane recruitment is mediated by the binding of the FYVE domain to membrane-embedded phosphatidylinositol-3-phosphate . Why this gene family is contracted in L. (Mundinia) remains to be investigated further.
The genomes of the three species of Leishmania (Mundinia) analyzed here are similar in size to that of L. (Sauroleishmania) tarentolae (~ 30 Mb), but smaller than those of the representatives of the subgenera L. (Leishmania) and L. (Viannia), as well as the genus Endotrypanum (~ 32 Mb). This correlates not only with the intuitively understandable domination of gene losses over gains and contractions over expansions, but also with the fact that both Mundinia and Sauroleishmania had switched to the new hosts or vectors. The majority of dixenous Leishmaniinae (i.e. Leishmania, Paraleishmania and Endotrypanum) parasitize mammals and are transmitted by phlebotomine sand flies and this, therefore, is the most likely ancestral variant of the life cycle. Meanwhile, Sauroleishmania spp. switched their vertebrate host from mammals to reptiles, whereas Mundinia spp. have substituted the phlebotomine sand fly hosts with biting midges and/or non-conventional sand flies. We speculate that adaptation to the new hosts or vectors has led to different, possibly simplified, host-parasite relationships and, thereby, made some of the previously used proteins redundant. Indeed, Sauroleishmania spp. demonstrate less specific relationships with their vertebrate hosts as compared to other Leishmania spp. Their promastigotes usually reside in the intestine or in the bloodstream, while occasionally formed amastigotes do not survive in macrophages .
Little is known about the relationships of L. (Mundinia) spp. and their vectors. However, our finding of a significant shrinkage of repertoires of the SCGs and SCAs in Mundinia, which are involved in interactions of promastigotes with the insect gut, implies simplification of the host-parasite relationships. At the same time, amastins and PIG-Y, which are primarily important for the survival of amastigotes in macrophages, showed generally the same evolutionary trends as in L. (Leishmania) and L. (Viannia), i.e. underwent independent losses. Moreover, those were mainly β-amastins, which are expressed in the vectorial part of the life cycle in T. cruzi . In contrast, Sauroleishmania lost all amastigote-specific δ-amastins , whereas all other Leishmania subgenera preserved them.
In summary, we propose that the evolution of genomes in the genus Leishmania and, in particular, in the subgenus Mundinia was mainly shaped by host (or vector) switches.
In this work we have sequenced and analyzed genomes of several representatives of the most understudied Leishmania subgenus, Mundinia. Comparative analyses allowed us to gain additional insights into the origin of pathogenic Leishmania. We propose that the evolution of this genus was mainly driven by the host (or vector) switches.
We are grateful to the members of our laboratories for stimulating discussions.
AB, PV and VY conceived the study. AB, AYK, JS, YK, TB, LP, DHM, DŽ, JL, PAB, PV, FRO, VY analyzed and interpreted the whole-genome data. AB and VY were major contributors in writing the manuscript. All authors read and approved the final manuscript.
Support from the ERD Funds (project OPVVV 16_019/0000759 to VY, PV, JS, AYK, and JL), University of Ostrava (project SGS08/PrF/2019 to LP, DHM, and DŽ), Russian Science Foundation (project 19–15- 00054 to VY), and Grant Agency of Czech Republic (projects 17-10656S to VY) is kindly acknowledged. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Ethics approval and consent to participate
Consent for publication
VY is an Associate Editor of BMC Genomics. Other authors declare that they have no competing interests.
- 3.Vickerman K. Comparative cell biology of the kinetoplastid flagellates. In: Vickerman K, Preston TM, editors. Biology of Kinetoplastida, vol. 1. London: Academic; 1976. p. 35–130.Google Scholar
- 6.Podlipaev SA. Catalogue of world fauna of Trypanosomatidae (protozoa), vol. 144. Leningrad: Zoologicheskii Institut AN SSSR; 1990.Google Scholar
- 19.Muniz J, Medina H. Cutaneous leishmaniasis of the guinea pig, Leishmania enriettii n. sp. Hospital (Rio J). 1948;33(1):7–25 (in Portuguese).Google Scholar
- 20.Jariyapan N, Daroontum T, Jaiwong K, Chanmol W, Intakhan N, Sor-Suwan S, Siriyasatien P, Somboon P, Bates MD, Bates PA. Leishmania (Mundinia) orientalis n. sp. (Trypanosomatidae), a parasite from Thailand responsible for localised cutaneous leishmaniasis. Parasit Vectors. 2018;11(1):351.PubMedPubMedCentralCrossRefGoogle Scholar
- 21.Barratt J, Kaufer A, Peters B, Craig D, Lawrence A, Roberts T, Lee R, McAuliffe G, Stark D, Ellis J. Isolation of novel trypanosomatid, Zelonia australiensis sp. nov. (Kinetoplastida: Trypanosomatidae) provides support for a Gondwanan origin of dixenous parasitism in the Leishmaniinae. PLOS Negl Trop Dis. 2017;11(1):e0005215.PubMedPubMedCentralCrossRefGoogle Scholar
- 23.Kwakye-Nuako G, Mosore MT, Duplessis C, Bates MD, Puplampu N, Mensah-Attipoe I, Desewu K, Afegbe G, Asmah RH, Jamjoom MB, et al. First isolation of a new species of Leishmania responsible for human cutaneous leishmaniasis in Ghana and classification in the Leishmania enriettii complex. Int J Parasitol. 2015;45(11):679–84.PubMedCrossRefPubMedCentralGoogle Scholar
- 31.Bualert L, Charungkiattikul W, Thongsuksai P, Mungthin M, Siripattanapipong S, Khositnithikul R, Naaglor T, Ravel C, El Baidouri F, Leelayoova S. Autochthonous disseminated dermal and visceral leishmaniasis in an AIDS patient, southern Thailand, caused by Leishmania siamensis. Am J Trop Med Hyg. 2012;86(5):821–4.PubMedPubMedCentralCrossRefGoogle Scholar
- 35.Kraeva N, Butenko A, Hlaváčová J, Kostygov A, Myškova J, Grybchuk D, Leštinová T, Votýpka J, Volf P, Opperdoes F, et al. Leptomonas seymouri: adaptations to the dixenous life cycle analyzed by genome sequencing, transcriptome profiling and co-infection with Leishmania donovani. PLOS Pathog. 2015;11(8):e1005127.PubMedPubMedCentralCrossRefGoogle Scholar
- 41.Raymond F, Boisvert S, Roy G, Ritt JF, Legare D, Isnard A, Stanke M, Olivier M, Tremblay MJ, Papadopoulou B, et al. Genome sequencing of the lizard parasite Leishmania tarentolae reveals loss of genes associated to the intracellular stage of human pathogenic species. Nucleic Acids Res. 2012;40(3):1131–47.PubMedCrossRefPubMedCentralGoogle Scholar
- 44.Rogers MB, Hilley JD, Dickens NJ, Wilkes J, Bates PA, Depledge DP, Harris D, Her Y, Herzyk P, Imamura H, et al. Chromosome and gene copy number variation allow major structural change between species and strains of Leishmania. Genome Res. 2011;21(12):2129–42.PubMedPubMedCentralCrossRefGoogle Scholar
- 45.Valdivia HO, Reis-Cunha JL, Rodrigues-Luiz GF, Baptista RP, Baldeviano GC, Gerbasi RV, Dobson DE, Pratlong F, Bastien P, Lescano AG, et al. Comparative genomic analysis of Leishmania (Viannia) peruviana and Leishmania (Viannia) braziliensis. BMC Genomics. 2015;16:715.PubMedPubMedCentralCrossRefGoogle Scholar
- 52.Flegontov P, Butenko A, Firsov S, Kraeva N, Eliáš M, Field MC, Filatov D, Flegontova O, Gerasimov ES, Hlaváčová J, et al. Genome of Leptomonas pyrrhocoris: a high-quality reference for monoxenous trypanosomatids and new insights into evolution of Leishmania. Sci Rep. 2016;6:23704.PubMedPubMedCentralCrossRefGoogle Scholar
- 79.Kangussu-Marcolino MM, de Paiva RM, Araujo PR, de Mendonca-Neto RP, Lemos L, Bartholomeu DC, Mortara RA, daRocha WD, Teixeira SM. Distinct genomic organization, mRNA expression and cellular localization of members of two amastin sub-families present in Trypanosoma cruzi. BMC Microbiol. 2013;13:10.PubMedPubMedCentralCrossRefGoogle Scholar
- 80.de Paiva RM, Grazielle-Silva V, Cardoso MS, Nakagaki BN, Mendonca-Neto RP, Canavaci AM, Souza Melo N, Martinelli PM, Fernandes AP, daRocha WD, et al. Amastin knockdown in Leishmania braziliensis affects parasite-macrophage interaction and results in impaired viability of intracellular amastigotes. PLoS Pathog. 2015;11(12):e1005296.PubMedPubMedCentralCrossRefGoogle Scholar
- 88.Dobson DE, Scholtes LD, Valdez KE, Sullivan DR, Mengeling BJ, Cilmi S, Turco SJ, Beverley SM. Functional identification of galactosyltransferases (SCGs) required for species-specific modifications of the lipophosphoglycan adhesin controlling Leishmania major-sand fly interactions. J Biol Chem. 2003;278(18):15523–31.PubMedCrossRefPubMedCentralGoogle Scholar
- 89.Dobson DE, Scholtes LD, Myler PJ, Turco SJ, Beverley SM. Genomic organization and expression of the expanded SCG/L/R gene family of Leishmania major: internal clusters and telomeric localization of SCGs mediating species-specific LPG modifications. Mol Biochem Parasitol. 2006;146(2):231–41.PubMedCrossRefPubMedCentralGoogle Scholar
- 101.Silva FM, Kostygov AY, Spodareva VV, Butenko A, Tossou R, Lukes J, Yurchenko V, Alves JMP. The reduced genome of Candidatus Kinetoplastibacterium sorsogonicusi, the endosymbiont of Kentomonas sorsogonicus (Trypanosomatidae): loss of the haem-synthesis pathway. Parasitology. 2018. https://doi.org/10.1017/S003118201800046X.PubMedCrossRefPubMedCentralGoogle Scholar
- 106.Wilson V, Southgate B. Lizard Leishmania. In: Lumsden W, Evans DA, editors. Biology of Kinetoplastida. New York: Academic; 1979. p. 242–68.Google Scholar
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.