Introduction

Schistosomiasis remains one of the most prevalent neglected tropical diseases affecting human populations in many parts of Africa, Asia, and South America. The World Health Organization (WHO) estimated that more than 218 million people in 78 countries required preventive chemotherapy in 2015 [1]. The WHO has recommended a multifaceted strategic plan for control and prevention of schistosomiasis, including large-scale chemotherapy for high-risk populations, hygiene education, access to safe drinking water, and snail control [2].

Fresh water pulmonate snails from the genus Biomphalaria act as the obligatory intermediate host for Schistosoma mansoni, the trematode species that causes intestinal schistosomiasis. A recent whole genome analysis for Biomphalaria glabrata, the major intermediate host in the Western Hemisphere, identified several potential targets for developing novel control measures [3]. In Egypt, where schistosomiasis dates to antiquity [4,5,6], Biomphalaria alexandrina is the predominant intermediate host for S. mansoni [7,8,9]. The presence of B. alexandrina is a key factor that determines the prevalence of intestinal schistosomiasis in the country.

In 2016, the Ministry of Health and Population of Egypt (MoHP) and WHO conducted a mapping of S. mansoni infection in five Nile Delta governorates [10]. The results of this project showed that prevalence rates ranged from 4.7% in Qalyubia Governorate to 17.6% in Sharqia Governorate, with an average prevalence of 10.7% in the five governorates surveyed. These observations indicate that previous assessments may have underestimated the extent of infection.

Efforts to control snail populations, such as molluscicides or introduction of predator species, have yielded only modest results. One potential target to be considered for snail control is its central nervous system (CNS), since it regulates vital functions including cardiac activity, feeding and reproductive behavior. In the present investigation, a neural transcriptome was generated with the ultimate goal of identifying signaling mechanisms involved in the response of B. alexandrina to infection by S. mansoni larvae. Such mechanisms may lead to novel and highly specific strategies for the control of snail populations.

Main text

Methods

Sample collection and preparation

Specimens of B. alexandrina (Fig. 1a) used in the present study were descendants from a field population that was collected from Giza Governorate in Egypt in 2012 and shipped to the Faculty of Medicine, Dalhousie University, Canada. They were maintained under a 14:10 light–dark cycle and fed romaine lettuce ad libitum. The central nervous systems (Fig. 1b) were dissected, immediately submerged in RNAlater (Thermo Fisher Scientific, USA), and stored at 4 °C for further analysis. Five pooled nervous systems were homogenized in lysis-binding solution provided in the RNAqueous-Micro Total RNA Isolation Kit (Thermo Fisher Scientific, USA). Total RNA was isolated following the manufacturer’s instructions. The RNA was quantified using a NanoDrop spectrophotometer and its quality was verified with an agarose gel. Samples were then sent to the Genomic Sequencing and Analysis Facility (GSAF) at the University of Texas at Austin for library preparation and Illumina sequencing. PolyA RNA selection was implemented using the Poly(A)Purist MAG Kit (Life Technologies). The mRNA quality was assessed with an Agilent Bioanalyzer. A non-directional RNA-seq library (cDNA inserts of approximately 384 bp) was generated using NEBNext Module Components. Sequencing was performed on an Illumina HiSeq 4000 sequencer (Paired End 2 × 150 bp).

Fig. 1
figure 1

Source of RNA used to generate transcriptome. a Biomphalaria alexandrina specimen. Calibration bar: 1 mm. b Dissected central nervous system. B g., buccal ganglion; C g., cerebral ganglion; V g., visceral ganglion. Calibration bar: 200 μm

Transcriptome preprocessing, assembly and annotation

Quality based trimming was implemented using Trimmomatic v0.33 [11] followed by K-mer spectral analysis to remove low abundance K-mers using the Khmer 2.0 package [12]. FastQC v0.11.3 was used to check data quality before and after trimming [13]. Filtration of input sequences produced ~ 14.4 million paired end (PE) reads and ~ 5.4 million single end (SE) reads. High-quality fragments were pooled for de novo transcriptome assembly using Trinity v2.2.0 [14] producing 149,545 transcripts. Several filtration procedures were performed to clean up the assembly: SeqClean was used to trim poly-A tails and remove 38 low complexity sequences [15]. Custom scripts were used to remove 745 fragments smaller than 200 bp. Back-mapping of input sequencing reads to the assembly using Salmon software, allowed removal of 4526 uncovered isoforms [16]. Finally, scanning the new assembly against the UniVec Database containing vector and artificial sequences eliminated 23 transcripts and trimmed an additional 98 transcripts [17]. In total, all filtration steps excluded 5332 transcripts. The final assembly was composed of 144,213 isoforms that belong to 128,739 genes. The total assembly size was 82.7 Mb with N50 equal to 716 base pairs. (Assembly statistics: Table 1). To evaluate the quality of the final assembly, another round of back mapping of input sequencing reads to the assembly was performed. PE and SE reads showed mapping rates of about 96 and 89% respectively. For benchmarking, Universal Single-Copy Orthologs (BUSCO v.2) software was used to assess annotation completeness against single-copy orthologs in Metazoa (978 orthologs) [18, 19]. BUSCO analysis was able to identify 79.5, 19.5, 1.1% complete, fragmented, and missing gene models. To assess expression abundance of assembled transcripts, mapped sequence reads were normalized into transcript per million scale (TPM) and logarithmic transformation of the normalized expression was plotted as a histogram (Fig. 2). Most of the transcripts had expression abundance levels around 1.5 TPM but only 848 transcripts had TPM values greater than 100 (Expression statistics: Table 2).

Table 1 Transcriptome assembly statistics
Fig. 2
figure 2

Histogram of log TPM expression

Table 2 Expression abundance statistics

For annotation, TransDecoder v2.0.1 was used to predict open reading frames (ORFs) [20]. A reciprocal BLAST search was implemented between the final assembly and target databases using BLAST plus v2.2.30. Significant BLAST hits (E-values < 10e−5) were utilized by crb-BLAST software to identify orthologous transcripts [21]. Transcripts with significant BLAST hits that failed to find a significant ortholog were annotated by the best BLAST hit as a candidate homolog.

Results

Annotation of all transcripts in the final assembly predicted 40,408 transcripts with long open reading frames (≥100 amino acids) (Additional file 1). Orthologs of 15,246 transcripts were identified by conditional BLAST search against the UniProtKB/Swiss-Prot database (Additional file 2). Homologs for an additional 16,810 transcripts were identified by the best BLAST hit against the same database (Additional file 3).

Each transcript in the FASTA file of the transcriptome was annotated with its transcript length and expression level in TPM values. Also, if identified, transcripts were annotated by their orthologs. Otherwise, annotation was implemented using homologs or with possible ORFs (Additional file 4).

Of the 848 highly expressed transcripts, approximately 49% corresponded to orthologs in the Swiss-Prot database. An additional 11% had homologs from a BLAST search against the same database. Another 17% had long ORFs, > 60% of which were complete, but failed to attain significant BLAST hit against the Swiss-Prot database. (Annotation Statistics: Table 3).

Table 3 Annotation statistics

The 848 highly expressed transcripts were selected for a second reciprocal BLAST search against the comprehensive NCBI Non-Redundant (NR) BLAST database. About 55% of transcripts were assigned an ortholog and 9% were annotated with a best homolog. Only 7% of the highly expressed transcripts had long ORF but failed to achieve a significant hit against the NR database. A new FASTA file for the highly expressed transcripts was annotated by the BLAST results against the NR database (Additional file 5). A reciprocal BLAST search against the recently published transcriptome of twelve pooled B. glabrata tissues [3] showed 22,613 orthologs with average identity 99.1% (Additional file 6).

Discussion

The relatively simple nervous systems of gastropod mollusks contain large identified neurons that allow detailed electrophysiological, biochemical, and molecular analyses at the cellular level [22,23,24]. Gastropods therefore serve as promising models for neurobiological studies exploring the cellular basis of behavior, including sensorimotor integration [25, 26], central pattern generator (CPG) networks [27, 28], neuroendocrine regulation of reproduction [29, 30], and responses to parasitism [31, 32]. The transcriptome of the B. alexandrina CNS complements the whole genome characterization of B. glabrata [3], and provides a resource for future investigation of parasite-host interactions with biomedical implications in a highly tractable nervous system. This transcriptome should also lead to novel strategies directed toward snail control.

Limitations

Potential limitations may include:

  1. 1.

    Short read sequencing can produce an incomplete gene model leading to inaccurate identification of multiple isoforms for the same gene.

  2. 2.

    RNA preparation with Poly(A) selection allows us to enrich for non-ribosomal transcripts but also will result in loss of non-polyadenylated transcripts.

  3. 3.

    Lack of biological and technical replicates.