Objective

Pythium insidiosum is a devastating oomycete pathogen that causes a difficult-to-treat infectious condition called pythiosis, which comes with high morbidity and mortality [1,2,3,4]. The pathogen has been taxonomized in the Phylum Oomycota and Clades Stramenopiles of Superkingdom Eukaryota (https://www.ncbi.nlm.nih.gov/taxonomy). Evolution might play an important role in differentiating P. insidiosum from the other pathogenic oomycetes, which mostly infect plants, to become a highly virulent human/animal pathogen. Typing the ribosomal deoxyribonucleic acid (rDNA) sequence classifies P. insidiosum into three major clades, which are associated with its geographic distributions: clade-I strains from the Americas, clade-II strains from the Americas, Asia, and Australia, and clade-III strains mainly from Thailand and the United States [5, 6]. Miraglia et al. recently reclassified the P. insidiosum strains in the genotype clade-III as Pythium periculosum based on the population genetic analysis [7]. So far, the draft genome data of at least 10 strains of P. insidiosum have been sequenced using various next-generation sequencing (NGS) platforms (i.e., 454, Illumina, and PacBio) and deposited in the public databases [8,9,10,11,12,13,14,15]. Most of these strains are genotypes clade-I and -II. The current study employed the MGI short-read NGS platform to obtain genome data of 2 additional strains, phylogenetically grouped in clade-III and isolated from human patients living in Thailand and the United States. This work is a part of our attempt to generate a comprehensive genome database from diverse strains of P. insidiosum for downstream comparative and population genomic analyses to explore microbial evolution and pathogenicity. The obtained genome information will serve our long-term goal of unveiling the pathogenesis mechanism and, ultimately, identifying the pathogen’s efficient diagnostic and therapeutic target.

Data description

This study reports draft genome sequences of two phylogenetic clade-III strains of P. insidiosum (reclassified as P. periculosum). The first strain, Pi057C3, was isolated from a human pythiosis patient in Thailand. The second strain, Pi050C3 (also known as strain ATCC90586), was isolated from a human pythiosis patient in the United States. Species and genotype identifications of these organisms were achieved by rDNA sequence homology analysis [5, 6]. The organisms were incubated in Sabouraud dextrose broth and shaken at 37 °C for seven days. The organism was harvested for genomic DNA (gDNA) extraction using the optimized method developed for P. insidiosum [16]. Following the company’s instruction, the obtained gDNA (300 ng) from each strain was subjected to the library construction using an MGI Eazy FS Library Prep Kit (MGI Tech, Shenzhen, China). The 150-bp paired-end sequencing was conducted on an MGISEQ-2000RS sequencer to generate the shotgun genome sequences.

46,422,128 reads comprising 6,963,319,140 bases from strain Pi057C3 and 66,245,520 reads comprising 9,936,827,970 bases from strain Pi050C3 were obtained. All cleaned reads were de novo sequence assembled using SPAdes v3.14.0 [17] with its default settings, except the size of k-mers, which was adjusted to k21, k33, k55, k77, and k99. As a result, a draft assembled genome of strain Pi057C3 contained 14,134 contigs with an average length of 3,010 bases (range: 200–937,540), L50 of 241, N50 of 45,748, total bases of 42,546,069, CG content of 57.6%, and genome coverage of 164x, whereas that of strain Pi050C3 contained 14,511 contigs with an average length of 2,984 bases (range: 200–964,331), L50 of 245, N50 of 45,208, total bases of 43,294,389, CG content of 57.7%, and genome coverage of 230x. The draft genomes had completeness of 99% (for strain Pi057C3) and 96% (for strain Pi050C3), as indicated by the BUSCO 5.4.5 software, using a well-defined set of 100 highly-conserved genes from Stramenopiles [18]. Augustus v3.3.3 [19] assigned 12,147 open reading frames (ORFs) in the genome of strain Pi057C3 and 12,249 ORFs in the genome of strain Pi050C3. Protein sequences from both strains were compared to the NCBI protein sequence database using BLASTP software. Based on functional classification using Clusters of Orthologous Groups (COG) [20], 6.8%, 10.1%, 11.1%, and 72.0% of the genes from strain Pi057C3 and 6.6%, 9.9%, 11.0%, and 72.5% of the genes from strain Pi050C3 can be respectively assigned to the following categories: (i) information storage and processing, (ii) cellular processes and signaling, (iii) metabolism, and (iv) poorly characterized or hypothetical proteins. The assembled genome sequences have been stored in the National Center for Biotechnology Information (NCBI) and DNA Data Bank of Japan (DDBJ) databases and are publicly accessible via the accession numbers JAKCXM000000000.1 (strain Pi057C3) and JAKCXL000000000.1 (strain Pi050C3) (Table 1).

In summary, the genomes of P. insidiosum (P. periculosum) strains Pi057C3 (genome size: 42.5 Mb) and PI050C3 (genome size: 43.3 Mb) isolated from Thai and American patients with pythiosis have been sequenced and made publicly available. These genome data will be analyzed as a part of our population genomic study to elucidate the genome-scale biodiversity of this pathogen.

Table 1 Overview of data files/data sets

Limitations

We used the MGISEQ-2000RS sequencer to generate short reads for the draft genome assembly of the organism strains Pi057C3 and Pi050C3. A few limitations of such an NGS platform were noted as follows. Like the sequencing-by-synthesis technique (SBS) used in other platforms (i.e., Illumina), the MGI SBS generates a sequence library employing DNA amplification, which, to some extent, results in sequence coverage biases and substitution errors. The draft genome of each strain was obtained from the construction of a 150-bp paired-end library, leading to relatively higher contig numbers (14,134 contigs for strain Pi057C3 and 14,511 contigs for strain Pi050C3) and relatively smaller genome sizes (42.5 Mb for strain Pi057C3 and 43.3 Mb for strain Pi050C3), when compared to the reference P. insidiosum genome, which was assembled from 4 paired-end and mate-pair libraries, resulting in fewer contigs (n = 1,192) and bigger genome size (53.2 Mb) [10].