Chromosome-scale genome sequencing, assembly and annotation of six genomes from subfamily Leishmaniinae

Almutairi, Hatim; Urbaniak, Michael D.; Bates, Michelle D.; Jariyapan, Narissara; Kwakye-Nuako, Godwin; Thomaz Soccol, Vanete; Al-Salem, Waleed S.; Dillon, Rod J.; Bates, Paul A.; Gatherer, Derek

doi:10.1038/s41597-021-01017-3

Chromosome-scale genome sequencing, assembly and annotation of six genomes from subfamily Leishmaniinae

Data Descriptor
Open access
Published: 06 September 2021

Volume 8, article number 234, (2021)
Cite this article

Download PDF

You have full access to this open access article

Scientific Data

Chromosome-scale genome sequencing, assembly and annotation of six genomes from subfamily Leishmaniinae

Download PDF

3357 Accesses
5 Citations
2 Altmetric
Explore all metrics

Abstract

We provide the raw and processed data produced during the genome sequencing of isolates from six species of parasites from the sub-family Leishmaniinae: Leishmania martiniquensis (Thailand), Leishmania orientalis (Thailand), Leishmania enriettii (Brazil), Leishmania sp. Ghana, Leishmania sp. Namibia and Porcisia hertigi (Panama). De novo assembly was performed using Nanopore long reads to construct chromosome backbone scaffolds. We then corrected erroneous base calling by mapping short Illumina paired-end reads onto the initial assembly. Data has been deposited at NCBI as follows: raw sequencing output in the Sequence Read Archive, finished genomes in GenBank, and ancillary data in BioSample and BioProject. Derived data such as quality scoring, SAM files, genome annotations and repeat sequence lists have been deposited in Lancaster University’s electronic data archive with DOIs provided for each item. Our coding workflow has been deposited in GitHub and Zenodo repositories. This data constitutes a resource for the comparative genomics of parasites and for further applications in general and clinical parasitology.

Measurement(s)	DNA • genome • sequence_assembly • sequence feature annotation
Technology Type(s)	DNA sequencing • Oxford Nanopore Sequencing • Illumina sequencing • sequence assembly process • sequence annotation
Sample Characteristic - Organism	Leishmaniinae
Sample Characteristic - Location	Namibia • Thailand • Ghana • Brazil

Machine-accessible metadata file describing the reported data: https://doi.org/10.6084/m9.figshare.15134085

Resequencing of the Leishmania infantum (strain JPCM5) genome and de novo assembly into 36 contigs

Article Open access 22 December 2017

Comparative genomic analysis of Leishmania (Viannia) peruviana and Leishmania (Viannia) braziliensis

Article Open access 18 September 2015

Complete assembly of the Leishmania donovani (HU3 strain) genome and transcriptome annotation

Article Open access 16 April 2019

Background & Summary

Leishmaniasis is a neglected tropical disease. It is considered to be a disease of poverty, primarily affecting low and middle-income countries (LMICs). Leishmaniasis is caused by parasites of the genus Leishmania and 18 different species are known to infect humans¹. 98 sandfly species are suspected or confirmed vectors of Leishmania². There are three major types of leishmaniasis: visceral, also known as kala-azar, is fatal if left untreated in over 95% of cases; cutaneous, the most common form, causes skin lesions leaving life-long scars and serious disability or stigma; mucocutaneous, leads to partial or total destruction of mucous membranes of the nose, mouth and throat³. Over one billion people live in endemic areas and are at risk of leishmaniasis. It is estimated that each year, globally, new cases of cutaneous leishmaniasis occur at an incidence of 700,000 to 1.2 million or more in over 100 countries⁴. Additionally, up to 300,000 visceral leishmaniasis cases cause more than 200,000 deaths annually⁵.

The genus Leishmania is divided into four subgenera: L. Leishmania, L. Viannia, L. Sauroleishmania and the newest subgenus L. Mundinia, the latter now accommodating several species from the L. enriettii complex and others, from five continents^{6,7,8,9,10,11,12}. In 1994, the Leishmania Genome Network was initiated¹³ and announced, ten years later, the assembly of the Leishmania major Friedlin strain as the first Leishmania reference genome¹⁴. Since then, a total of 58 genomes have become available publicly, assembled at a variety of levels of completeness ranging from contigs to chromosome level. Prior to our project, only two L. Mundinia subgenus genomes have been sequenced and assembled: Leishmania enriettii, strain LEM3045 (GCA_000410755) and Leishmania sp. MAR, strain LEM2494 (GCA_000410755). The genus Porcisia is a sister genus of Leishmania within the sub-family Leishmaniinae. Prior to the release of our genome, there were no genome sequences for genus Porcisia. Subsequently, the partial genome of P. deanei was released and published¹⁵.

We assembled and annotated the genomes of five L. Mundinia species – those of L. martiniquensis, L. orientalis, L. enriettii, L. sp. Ghana and L. sp. Namibia - and one genome in the genus Porcisia – that of P. hertigi, formerly known as L. hertigi¹⁶ - using Illumina and Nanopore sequencing. The two isolates from Ghana and Namibia are from new species that have not yet been formally named. The World Health Organization (WHO) codes for the six isolates are: L. martiniquensis MHOM/TH/2012/LSCM1;LV760; L. orientalis MHOM/TH/2014/LSCM4;LV768; L. enriettii MCAV/BR/2001/CUR178;LV673; L. sp. Ghana MHOM/GH/2012/GH5;LV757; L. sp. Namibia MPRO/NA/1975/252;LV425; and P. hertigi MCOE/PA/1965/C119;LV43. Nanopore long reads were used for the initial scaffolding assemblies, followed by mapping of the Illumina short reads onto these scaffolds, thus increasing quality of the assembled sequence while preserving whole chromosome integrity. Final polishing, reordering and reorienting of chromosomes, along with masking and classifying of repeat regions, was guided by the most closely related reference genome for each species. Finished genome annotation was both evidence-based and ab initio.

Figure 1 summarises data sizes and total yield per sample. The total sequencing data file size for all samples was 139.33 Gigabytes, yielding 58.70 GigaBases of sequence data from 23.71 GigaReads. Figure 2 summarises our analysis workflow. This workflow generated four main outputs for each assembly: genome, proteome, and transcriptome files in FASTA format, and a General Feature Format file (GFF) that contains the coordinates for all proteins and transcripts in the assembly.

Methods

Sample collection, sequencing and software

From the parasite cryobank at Lancaster University, we selected six samples of the species listed above without publicly available reference genomes. Table 1 gives details for strains, isolates, BioSample and BioProject accessions^{17,18,19,20,21,22,23,24,25,26,27,28}. Illumina HiSeq 4000 and MiSeq sequencing was contracted to BGI Genomics and Aberystwyth University. Nanopore sequencing was performed in-house using MinION FLO-MIN106 flow cells with SQK-LSK109 ligation sequencing protocol. Throughout the text we provide literature citations to software where available. Links to both published and unpublished software used are provided in Table 2. We created public GitHub and Zenodo repositories for the analysis pipeline^29,30.

Table 1 Sample descriptions for all assemblies.

Full size table

Table 2 Tools used in analysis workflow with conda or docker link.

Full size table

Genome assembly

De novo assemblies were performed with Nanopore MinION long reads using Flye³¹. Due to the low quality scores in Nanopore long reads, we mapped high quality Illumina short reads onto the assemblies and created corrected consensus sequences using minimap2³² and SAMtools³³. The consensus sequence was scanned for any contamination or any sequence of vector origin by BLAST+³⁴ on the UniVec database³⁵. Finally, a polishing step was done to minimise gaps using Pilon³⁶.

Chromosome verification

For all chromosomes of each polished genome, we then ran BLAST + (parameters: -max_target_seqs. 1 -max_hsps 1) against all TriTrypDB³⁷ release-47 genomes. The output for each genome was then visualized using wordcloud to suggest the closest relative among TriTrypDB genomes³⁸. Then, synteny was plotted for each genome by aligning each of its chromosomes with the corresponding chromosomes of its wordcloud-predicted closest relative, using MUMmer³⁹ (Fig. 3). This confirmed that the order and orientation of the chromosomes of each genome was equivalent to those of its closest TriTrypDB genome. Completion was then achieved by sorting and removing any duplicate scaffolds or contigs using funannotate⁴⁰, followed by a final quality check using Genome Assembly Annotation Service (GAAS).

Repetitive element annotation

We identified and classified repeat regions in the polished assemblies using RepeatModeller and TEclass⁴¹. Then, we generated a stratified genome-wide repeat plot for each assembly³⁸ (see also L. martiniquensis example in Fig. 4) to assist the decision of which repeats to mask, using RepeatMasker.

Gene prediction and functional annotation

After repeat masking, we annotated the assemblies using the MAKER2⁴² annotation pipeline over two rounds: 1) an evidence-based annotation round using EST, mRNA-seq and protein homology evidence from TriTrypDB release-47 along with our repeat-masking output, 2) an ab initio round using AUGUSTUS⁴³, with the pre-trained L. tarentolae as the model organism. After each round, Annotation Edit Distance (AED) scores were calculated and plotted (Fig. 5). We calculated brief statistics for each round, e.g. the number of genes and other features, using Genometools⁴⁴ and AGAT⁴⁵. After completion of all annotation rounds, we assigned functional annotations from the Uniprot⁴⁶ and Pfam⁴⁷ databases using BLAST + and InterProScan⁴⁸.

Analysis pipeline

To make sure that all assemblies and annotations are reproducible by future investigators, the entire process from obtaining the SRAs^{49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91} to the annotation assignments^{92,93,94,95,96,97} has been made available²⁹ using Snakemake⁹⁸. This Snakemake pipeline ought to be easily adaptable to the sequencing of further similar parasite genomes, throughout the parasitology community³⁰.

Data Records

Table 3 details the sequencing output. Short and long reads were deposited in the NCBI Sequence Read Archive (SRA)^{49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91}. Six BioProjects^{23,24,25,26,27,28} and six BioSamples^{17,18,19,20,21,22} were also created at NCBI. The assembled genomes were deposited at NCBI Assembly^{99,100,101,102,103,104}. Additional files containing raw reads quality reports^{105,106,107,108,109,110}, mapped reads^{111,112,113,114,115,116}, classified repeated sequences^{117,118,119,120,121,122} and functional annotations^{92,93,94,95,96,97} were deposited at Lancaster University electronic data archive.

Table 3 Details of reads, bases and file sizes.

Full size table

Technical Validation

Genomic DNA integrity

Genomic DNA was extracted using Trizol (Invitrogen) and quantified using Qubit® dsDNA HS Assay Kits (ThermoFisher Scientific) prior to sequencing. Concentrations ranged between 68.2 and 120 ng/µL. For consistency, we used the same extracted DNA for all three sequencing platforms (Nanopore MinION, Illumina HiSeq 4000 and MiSeq). Furthermore, we assessed the gDNA high molecular weight using N50 estimates of MinION long reads which were ranged between 12.07 and 22.92 kilobases.

Contamination screening

We scanned all assemblies for any contamination or any sequence of vector origin by first building a UniVec Database and then using BLAST+ . All contaminants were found either at the beginning or at the end of contigs and then deleted. No contaminants affected assembly integrity.

Quality of short and long raw sequence reads

We used FastQC to check the sequence quality of Illumina short reads sequences and pycoQC to check the Nanopore long reads sequence quality. We used MultiQC¹²³ to output all sequence quality scores in one interactive report^{105,106,107,108,109,110}.

Assembly validation

Since the analysis took many steps to finish, quality checks were introduced between each step. Some checks were focused on completeness, for instance using BUSCO¹²⁴ as a benchmark for the presence of expected universal single-copy orthologues. Other checks focussed on the correct order and orientation of the chromosomes, for instance MUMmer alignment to find synteny between assemblies and other Leishmania genomes. Yet further checks focussed on the accuracy and precision of annotation, for instance using Annotation Edit Distance score (AED) in MAKER2 (Fig. 5). We checked reproducibility of the assemblies and annotations using Snakemake.

Code availability

The Snakemake analyses pipeline was deposited at GitHub and Zenodo repositories^29,30. Links to software used as well as relevant conda and docker containers are given in Table 2.

References

Steverding, D. The history of leishmaniasis. Parasit Vectors 10, 82–91 (2017).
Article PubMed PubMed Central Google Scholar
Maroli, M., Feliciangeli, M. D., Bichaud, L., Charrel, R. N. & Gradoni, L. Phlebotomine sandflies and the spreading of leishmaniases and other diseases of public health concern. Med Vet Entomol 27, 123–147 (2013).
Article CAS PubMed Google Scholar
Zijlstra, E. E. PKDL and other dermal lesions in HIV co-infected patients with Leishmaniasis: review of clinical presentation in relation to immune responses. PLoS Negl Trop Dis 8, e3258 (2014).
Article PubMed PubMed Central Google Scholar
Al-Salem, W., Herricks, J. R. & Hotez, P. J. A review of visceral leishmaniasis during the conflict in South Sudan and the consequences for East African countries. Parasit Vectors 9, 460–470 (2016).
Article PubMed PubMed Central Google Scholar
Burza, S., Croft, S. L. & Boelaert, M. Leishmaniasis. Lancet 392, 951–970 (2018).
Article PubMed Google Scholar
Desbois, N., Pratlong, F., Quist, D. & Dedet, J. P. Leishmania (Leishmania) martiniquensis n. sp. (Kinetoplastida: Trypanosomatidae), description of the parasite responsible for cutaneous leishmaniasis in Martinique Island (French West Indies). Parasite 21, 12–15 (2014).
Article PubMed PubMed Central Google Scholar
Jariyapan, N. et al. Leishmania (Mundinia) orientalis n. sp. (Trypanosomatidae), a parasite from Thailand responsible for localised cutaneous leishmaniasis. Parasit Vectors 11, 351–359 (2018).
Article PubMed PubMed Central Google Scholar
Kwakye-Nuako, G. et al. First isolation of a new species of Leishmania responsible for human cutaneous leishmaniasis in Ghana and classification in the Leishmania enriettii complex. Int J Parasitol 45, 679–684 (2015).
Article CAS PubMed Google Scholar
Lobsiger, L. et al. An autochthonous case of cutaneous bovine leishmaniasis in Switzerland. Vet Parasitol 169, 408–414 (2010).
Article CAS PubMed Google Scholar
Muller, N. et al. Occurrence of Leishmania sp. in cutaneous lesions of horses in Central Europe. Vet Parasitol 166, 346–351 (2009).
Article PubMed Google Scholar
Reuss, S. M. et al. Autochthonous Leishmania siamensis in horse, Florida, USA. Emerg Infect Dis 18, 1545–1547 (2012).
Article PubMed PubMed Central Google Scholar
Rose, K. et al. Cutaneous leishmaniasis in red kangaroos: isolation and characterisation of the causative organisms. Int J Parasitol 34, 655–664 (2004).
Article CAS PubMed Google Scholar
Ivens, A. C. & Blackwell, J. M. The Leishmania genome comes of age. Parasitol Today 15, 225–231 (1999).
Article CAS PubMed Google Scholar
Ivens, A. C. et al. The genome of the kinetoplastid parasite, Leishmania major. Science 309, 436–442 (2005).
Article ADS PubMed PubMed Central Google Scholar
Albanaz, A. T. S. et al. Genome analysis of Endotrypanum and Porcisia spp., closest phylogenetic relatives of Leishmania, highlights the role of amastins in shaping pathogenicity. Genes (Basel) 12, 444–463 (2021).
Article CAS Google Scholar
Espinosa, O. A., Serrano, M. G., Camargo, E. P., Teixeira, M. M. G. & Shaw, J. J. An appraisal of the taxonomy and nomenclature of trypanosomatids presently classified as Leishmania and Endotrypanum. Parasitology 145, 430–442 (2018).
Article CAS PubMed Google Scholar
NCBI BioSample https://identifiers.org/ncbi/biosample:SAMN17294109 (2021).
NCBI BioSample https://identifiers.org/ncbi/biosample:SAMN17294111 (2021).
NCBI BioSample https://identifiers.org/ncbi/biosample:SAMN17294112 (2021).
NCBI BioSample https://identifiers.org/ncbi/biosample:SAMN17294115 (2021).
NCBI BioSample https://identifiers.org/ncbi/biosample:SAMN17294129 (2021).
NCBI BioSample https://identifiers.org/ncbi/biosample:SAMN17294121 (2021).
NCBI BioProject https://identifiers.org/ncbi/bioproject:PRJNA691531 (2021).
NCBI BioProject https://identifiers.org/ncbi/bioproject:PRJNA691532 (2021).
NCBI BioProject https://identifiers.org/ncbi/bioproject:PRJNA691534 (2021).
NCBI BioProject https://identifiers.org/ncbi/bioproject:PRJNA691536 (2021).
NCBI BioProject https://identifiers.org/ncbi/bioproject:PRJNA689706 (2021).
NCBI BioProject https://identifiers.org/ncbi/bioproject:PRJNA691541 (2021).
Almutairi, H. hatimalmutairi/LGAAP. https://doi.org/10.5281/zenodo.4663265 (2021).
Almutairi, H. et al. LGAAP: Leishmaniinae Genome Assembly and Annotation Pipeline. Microbiol Resour Announc 10, e0043921 (2021).
PubMed Google Scholar
Kolmogorov, M., Yuan, J., Lin, Y. & Pevzner, P. A. Assembly of long, error-prone reads using repeat graphs. Nat Biotechnol 37, 540–546 (2019).
Article CAS PubMed Google Scholar
Li, H. Minimap and miniasm: fast mapping and de novo assembly for noisy long sequences. Bioinformatics 32, 2103–2110 (2016).
Article CAS PubMed PubMed Central Google Scholar
Danecek, P. et al. Twelve years of SAMtools and BCFtools. Gigascience 10, giab008. https://doi.org/10.1093/gigascience/giab008 (2021).
Camacho, C. et al. BLAST+: architecture and applications. BMC Bioinformatics 10, 421–428 (2009).
Article PubMed PubMed Central Google Scholar
NCBI. The UniVec Database. https://www.ncbi.nlm.nih.gov/tools/vecscreen/univec/ (2016).
Walker, B. J. et al. Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement. PLoS One 9, e112963 (2014).
Article ADS PubMed PubMed Central Google Scholar
Aslett, M. et al. TriTrypDB: a functional genomic resource for the Trypanosomatidae. Nucleic Acids Res 38, D457–462 (2010).
Article CAS PubMed Google Scholar
Almutairi, H. Supplementary materials for chromosome-scale genome sequencing, assembly and annotation of six genomes from subfamily Leishmaniinae. Lancaster University https://doi.org/10.17635/lancaster/researchdata/474 (2021).
Delcher, A. L., Salzberg, S. L. & Phillippy, A. M. Using MUMmer to identify similar regions in large sequence sets. Curr Protoc Bioinformatics Chapter 10: Unit 10.3. https://doi.org/10.1002/0471250953.bi1003s00 (2003).
Palmer, J. & Stajich, J. nextgenusfs/funannotate: funannotate v1.5.3 (Version 1.5.3). Zenodo. https://doi.org/10.5281/zenodo.2604804 (2019).
Abrusan, G., Grundmann, N., DeMester, L. & Makalowski, W. TEclass–a tool for automated classification of unknown eukaryotic transposable elements. Bioinformatics 25, 1329–1330 (2009).
Article CAS PubMed Google Scholar
Holt, C. & Yandell, M. MAKER2: an annotation pipeline and genome-database management tool for second-generation genome projects. BMC Bioinformatics 12, 491 (2011).
Article PubMed PubMed Central Google Scholar
Hoff, K. J. & Stanke, M. Predicting genes in single genomes with AUGUSTUS. Curr Protoc Bioinformatics 65, e57 (2019).
PubMed Google Scholar
Gremme, G., Steinbiss, S. & Kurtz, S. GenomeTools: a comprehensive software library for efficient processing of structured genome annotations. IEEE/ACM Trans Comput Biol Bioinform 10, 645–656 (2013).
Article PubMed Google Scholar
Dainat, J., Hereñú, D., & Pucholt, P. NBISweden/AGAT: AGAT-v0.7.0 (v0.7.0). Zenodo. https://doi.org/10.5281/zenodo.5036996 (2021).
UniProt, C. UniProt: the universal protein knowledgebase in 2021. Nucleic Acids Res 49, D480–D489 (2021).
Article Google Scholar
Mistry, J. et al. Pfam: The protein families database in 2021. Nucleic Acids Res 49, D412–D419 (2021).
Article CAS PubMed Google Scholar
Jones, P. et al. InterProScan 5: genome-scale protein function classification. Bioinformatics 30, 1236–1240 (2014).
Article CAS PubMed PubMed Central Google Scholar
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRX9957074 (2021).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRX9957073 (2021).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRX9957072 (2021).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRX9957071 (2021).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRX9957070 (2021).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRX9957069 (2021).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRX9957068 (2021).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRX9957067 (2021).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRX9957066 (2021).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRX9957065 (2021).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRX9957064 (2021).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRX9957063 (2021).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRX9957062 (2021).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRX9957061 (2021).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRX9957060 (2021).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRX9957059 (2021).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRX9957058 (2021).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRX9957057 (2021).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRX9957056 (2021).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRX9957055 (2021).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRX9957054 (2021).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRX9957079 (2021).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRX9957078 (2021).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRX9957077 (2021).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRX9957076 (2021).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRX9957075 (2021).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRX9957086 (2021).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRX9957085 (2021).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRX9957084 (2021).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRX9957083 (2021).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRX9957082 (2021).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRX9957081 (2021).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRX9957080 (2021).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRX9957038 (2021).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRX9957037 (2021).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRX9957036 (2021).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRX9957035 (2021).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRX9957034 (2021).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRX9957048 (2021).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRX9957047 (2021).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRX9957046 (2021).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRX9957045 (2021).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRX9957044 (2021).
Almutairi, H. L. (Mundinia) martiniquensis: functional annotations. Lancaster University https://doi.org/10.17635/lancaster/researchdata/446 (2021).
Almutairi, H. L. (Mundinia) orientalis: functional annotations. Lancaster University https://doi.org/10.17635/lancaster/researchdata/449 (2021).
Almutairi, H. L. (Mundinia) enriettii: functional annotations. Lancaster University https://doi.org/10.17635/lancaster/researchdata/452 (2021).
Almutairi, H. L. (Mundinia) sp. Ghana: functional annotations. Lancaster University https://doi.org/10.17635/lancaster/researchdata/455 (2021).
Almutairi, H. L. (Mundinia) sp. Namibia: functional annotations. Lancaster University https://doi.org/10.17635/lancaster/researchdata/458 (2021).
Almutairi, H. Porcisia hertigi: functional annotations. Lancaster University https://doi.org/10.17635/lancaster/researchdata/461 (2021).
Mölder, F. et al. Sustainable data analysis with Snakemake. F1000Research 10, https://f1000research.com/articles/10-33/v2 (2021).
NCBI Assembly https://identifiers.org/insdc.gca:GCA_017916325.1 (2021).
NCBI Assembly https://identifiers.org/insdc.gca:GCA_017916335.1 (2021).
NCBI Assembly https://identifiers.org/insdc.gca:GCA_017916305.1 (2021).
NCBI Assembly https://identifiers.org/insdc.gca:GCA_017918215.1 (2021).
NCBI Assembly https://identifiers.org/insdc.gca:GCA_017918225.1 (2021).
NCBI Assembly https://identifiers.org/insdc.gca:GCA_017918235.1 (2021).
Almutairi, H. L. (Mundinia) martiniquensis raw reads quality reports. Lancaster University https://doi.org/10.17635/lancaster/researchdata/437 (2021).
Almutairi, H. Leishmania (Mundinia) orientalis raw reads quality reports. Lancaster University https://doi.org/10.17635/lancaster/researchdata/438 (2021).
Almutairi, H. Leishmania (Mundinia) enriettii raw reads quality reports. Lancaster University https://doi.org/10.17635/lancaster/researchdata/439 (2021).
Almutairi, H. Leishmania (Mundinia) sp. Ghana raw reads quality reports. Lancaster University https://doi.org/10.17635/lancaster/researchdata/440 (2021).
Almutairi, H. Leishmania (Mundinia) sp. Namibia raw reads quality reports. Lancaster University https://doi.org/10.17635/lancaster/researchdata/441 (2021).
Almutairi, H. Porcisia hertigi raw reads quality reports. Lancaster University https://doi.org/10.17635/lancaster/researchdata/442 (2021).
Almutairi, H. L. (Mundinia) martiniquensis: mapped reads in SAM and BAM format. Lancaster University https://doi.org/10.17635/lancaster/researchdata/444 (2021).
Almutairi, H. L. (Mundinia) orientalis: mapped reads in SAM and BAM format. Lancaster University https://doi.org/10.17635/lancaster/researchdata/447 (2021).
Almutairi, H. L. (Mundinia) enriettii: mapped reads in SAM and BAM format. Lancaster University https://doi.org/10.17635/lancaster/researchdata/450 (2021).
Almutairi, H. L. (Mundinia) sp. Ghana: mapped reads in SAM and BAM format. Lancaster University https://doi.org/10.17635/lancaster/researchdata/453 (2021).
Almutairi, H. L. (Mundinia) sp. Namibia: mapped reads in SAM and BAM format. Lancaster University https://doi.org/10.17635/lancaster/researchdata/456 (2021).
Almutairi, H. Porcisia hertigi: mapped reads in SAM and BAM format. Lancaster University https://doi.org/10.17635/lancaster/researchdata/459 (2021).
Almutairi, H. L. (Mundinia) martiniquensis: classified repeated sequences. Lancaster University https://doi.org/10.17635/lancaster/researchdata/445 (2021).
Almutairi, H. L. (Mundinia) orientalis: classified repeated sequences. Lancaster University https://doi.org/10.17635/lancaster/researchdata/448 (2021).
Almutairi, H. L. (Mundinia) enriettii: classified repeated sequences. Lancaster University https://doi.org/10.17635/lancaster/researchdata/451 (2021).
Almutairi, H. L. (Mundinia) sp. Ghana: classified repeated sequences. Lancaster University https://doi.org/10.17635/lancaster/researchdata/454 (2021).
Almutairi, H. L. (Mundinia) sp. Namibia: classified repeated sequences. Lancaster University https://doi.org/10.17635/lancaster/researchdata/457 (2021).
Almutairi, H. Porcisia hertigi: classified repeated sequences. Lancaster University https://doi.org/10.17635/lancaster/researchdata/460 (2021).
Ewels, P., Magnusson, M., Lundin, S. & Kaller, M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics 32, 3047–3048 (2016).
Article CAS PubMed PubMed Central Google Scholar
Seppey, M., Manni, M. & Zdobnov, E. M. BUSCO: assessing genome assembly and annotation completeness. Methods Mol Biol 1962, 227–245 (2019).
Article CAS PubMed Google Scholar

Download references

Acknowledgements

We thank the Ministry of Health and Public Health Authority of Saudi Arabia for funding.

Author information

Authors and Affiliations

Division of Biomedical & Life Sciences, Faculty of Health & Medicine, Lancaster University, Lancaster, LA1 4YT, UK
Hatim Almutairi, Michael D. Urbaniak, Michelle D. Bates, Rod J. Dillon, Paul A. Bates & Derek Gatherer
Ministry of Health, Riyadh, Saudi Arabia
Hatim Almutairi & Waleed S. Al-Salem
Department of Parasitology, Faculty of Medicine, Chulalongkorn University, Bangkok, 10330, Thailand
Narissara Jariyapan
Department of Biomedical Sciences, School of Allied Health Sciences, College of Health & Allied Sciences, University of Cape Coast, Cape Coast, Ghana
Godwin Kwakye-Nuako
Laboratório de Biologia Molecular, Programa de Pós Graduação em Engenharia de Bioprocessos e Biotecnologia, Universidade Federal do Paraná, Curitiba, Brazil
Vanete Thomaz Soccol

Authors

Hatim Almutairi
View author publications
You can also search for this author in PubMed Google Scholar
Michael D. Urbaniak
View author publications
You can also search for this author in PubMed Google Scholar
Michelle D. Bates
View author publications
You can also search for this author in PubMed Google Scholar
Narissara Jariyapan
View author publications
You can also search for this author in PubMed Google Scholar
Godwin Kwakye-Nuako
View author publications
You can also search for this author in PubMed Google Scholar
Vanete Thomaz Soccol
View author publications
You can also search for this author in PubMed Google Scholar
Waleed S. Al-Salem
View author publications
You can also search for this author in PubMed Google Scholar
Rod J. Dillon
View author publications
You can also search for this author in PubMed Google Scholar
Paul A. Bates
View author publications
You can also search for this author in PubMed Google Scholar
Derek Gatherer
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Derek Gatherer.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

The Creative Commons Public Domain Dedication waiver http://creativecommons.org/publicdomain/zero/1.0/ applies to the metadata files associated with this article.

Reprints and permissions

About this article

Cite this article

Almutairi, H., Urbaniak, M.D., Bates, M.D. et al. Chromosome-scale genome sequencing, assembly and annotation of six genomes from subfamily Leishmaniinae. Sci Data 8, 234 (2021). https://doi.org/10.1038/s41597-021-01017-3

Download citation

Received: 07 May 2021
Accepted: 09 August 2021
Published: 06 September 2021
DOI: https://doi.org/10.1038/s41597-021-01017-3
Springer Nature Limited

This article is cited by

Identification of a conserved maxicircle and unique minicircles as part of the mitochondrial genome of Leishmania martiniquensis strain PCM3 in Thailand
- Pornchai Anuntasomboon
- Suradej Siripattanapipong
- Teerasak E-kobon
Parasites & Vectors (2022)

Chromosome-scale genome sequencing, assembly and annotation of six genomes from subfamily Leishmaniinae

Abstract

Similar content being viewed by others

Resequencing of the Leishmania infantum (strain JPCM5) genome and de novo assembly into 36 contigs

Comparative genomic analysis of Leishmania (Viannia) peruviana and Leishmania (Viannia) braziliensis

Complete assembly of the Leishmania donovani (HU3 strain) genome and transcriptome annotation

Background & Summary

Methods