Trypanosoma cruzi specific mRNA amplification by in vitro transcription improves parasite transcriptomics in host-parasite RNA mixtures
- 882 Downloads
Trypanosomatids are a group of protozoan parasites that includes the etiologic agents of important human illnesses as Chagas disease, sleeping sickness and leishmaniasis. These parasites have a significant distinction from other eukaryotes concerning mRNA structure, since all mature mRNAs have an identical species-specific sequence of 39 nucleotides at the 5′ extremity, named spliced leader (SL). Considering this peculiar aspect of trypanosomatid mRNA, the aim of the present work was to develop a Trypanosoma cruzi specific in vitro transcription (IVT) linear mRNA amplification method in order to improve parasite transcriptomics analyses.
We designed an oligonucleotide complementary to the last 21 bases of T. cruzi SL sequence, bearing an upstream T7 promoter (T7SL primer), which was used to direct the synthesis of second-strand cDNA. Original mRNA was then amplified by IVT using T7 RNA polymerase. T7SL-amplified RNA from two distinct T. cruzi stages (epimastigotes and trypomastigotes) were deep sequenced in SOLiD platform. Usual poly(A) + RNA and and T7-oligo(dT) amplified RNA (Eberwine method) were also sequenced. RNA-Seq reads were aligned to our new and improved T. cruzi Dm28c genome assembly (PacBio technology) and resulting transcriptome pattern from these three RNA preparation methods were compared, mainly concerning the conservation of mRNA transcritional levels and DEGs detection between epimastigotes and trypomastigotes.
T7SL IVT method detected more potential differentially expressed genes in comparison to either poly(A) + RNA or T7dT IVT, and was also able to produce reliable quantifications of the parasite transcriptome down to 3 ng of total RNA. Furthermore, amplification of parasite mRNA in HeLa/epimastigote RNA mixtures showed that T7SL IVT generates transcriptome quantification with similar detection of differentially expressed genes when parasite RNA mass was only 0.1% of the total mixture (R = 0.78 when compared to poly(A) + RNA).
The T7SL IVT amplification method presented here allows the detection of more potential parasite differentially expressed genes (in comparison to poly(A) + RNA) in host-parasite mixtures or samples with low amount of RNA. This method is especially useful for trypanosomatid transcriptomics because it produces less bias than PCR-based mRNA amplification. Additionally, by simply changing the complementary region of the T7SL primer, the present method can be applied to any trypanosomatid species.
KeywordsTrypanosoma cruzi Trypanosomatid RNA-Seq Transcriptomics RNA amplification T7 RNA polymerase
Coding DNA sequence
Differentially expressed genes
T. cruzi epimastigotes
False discovery rate
in vitro transcription
log2 of counts per million reads
Principal component analysis
Supra Genes (cluster of orthologous genes in T. cruzi genome)
Oligonucleotide containing a T7 RNA pol promoter and a downstream poly-T sequence
Oligonucleotide complementary to the last 21 bases of T. cruzi SL sequence, bearing an upstream T7 RNA pol promoter
Transcriptomics is a relatively new area of life sciences that aims to analyze the complete set of expressed RNAs in a specific cell or organism, under specific conditions, using high-throughput techniques . Initially based on DNA microarrays , nowadays transcriptomics is mainly performed by RNA-Seq . This method consists in deep sequencing of cDNA, followed by transcript identification and quantification based on the alignment of millions of sequences (reads) to a reference genome , resulting in a quantitative digital map of RNA expression . RNA-Seq is a very powerful technique [1, 6], but one limitation is the initial RNA mass required for entering the pipeline (generally 100 ng for poly(A) + RNA). To circumvent input limitations, RNA amplification is an interesting option, and recent developments have enabled transcriptome analysis from few or even single cells [7, 8].
Three main RNA amplification methods are mostly used: (i) in vitro transcription (IVT), (ii) PCR-based and (iii) rolling circle amplification (reviewed by ). IVT protocol was initially developed in the Eberwine laboratory , and is based on reverse transcription with an oligo(dT) primer bearing an upstream T7 promoter, which directs in vitro transcription with T7 RNA polymerase (amplification) after synthesis and purification of double-strand cDNA. The linear amplified material provides a more precise estimation of transcript levels in comparison to PCR-based exponential amplification . As microarrays required a higher RNA mass when compared to RNA-Seq, IVT was commonly used to produce enough material for array hybridization , and no significant bias was introduced by RNA amplification [12, 13, 14]. Besides, a more reproducible expression profile from a wide range of RNA inputs is produced, improving the reliability of array results regardless of sample expansion [12, 13]. Hence, IVT mRNA amplification was extensively used for preparing RNA samples for array analysis [15, 16] and it was recently adapted for RNA-Seq  and single-cell RNA-Seq [18, 19, 20].
Trypanosomatids are a group of human pathogenic parasites that includes the etiological agents of Chagas disease (Trypanosoma cruzi), sleeping sickness (Trypanosoma brucei) and leishmaniasis (Leishmania sp) . Trypanosomatids have a significant distinction from other eukaryotes concerning RNA transcription and processing: all mature mRNAs are produced after the processing of polycistronic precursors by concerted trans-splicing and polyadenylation reactions . Through trans-splicing, all mature mRNA have the same 39 nucleotides species-specific sequence at the 5′ region, called mini-exon or spliced leader (SL) . This characteristic allows the use of the SL sequence for specific analyzes of all mRNA from trypanosomatids. The “spliced leader trapping” method  enriches for 5´-SL extremity of mRNA by using a SL specific primer for second-strand cDNA synthesis and to further amplify 5´-SL containing fragments by PCR, enabling simultaneous mapping of 5’splice sites and profiling of corresponding gene expression on RNA-Seq experiments. Similar protocols have been used to map splice sites [25, 26, 27] and to measure changes in splice site usage during T. brucei life cycle [24, 26]. Detailed protocols to generate RNA-Seq libraries enriched for 5′-SL mRNA extremity were recently published [28, 29] and a web server platform for quantitative identification of SL and polyadenylation sites in kinetoplastid genomes is also available . Recently, using a method similar to SL trapping, Mulindwa and collaborators  used SL primers to synthesize T. brucei-specific cDNA on host-parasite RNA mixtures, followed by PCR amplification using nested primers to amplify parasite mRNA. Although introducing significant bias in comparison to poly(A) + RNA, the authors considered that PCR based amplification method of SL-containing mRNAs allowed comparison of different samples as long as they were all treated equally .
Here, we describe the use of T. cruzi SL sequence for parasite-specific mRNA amplification. We demonstrate a T. cruzi specific IVT amplification method based in second-strand cDNA synthesis using a SL primer bearing an upstream T7 promoter (T7SL). RNA-Seq analyses of two distinct forms of T. cruzi (epimastigotes and trypomastigotes) show that T7SL amplification does not introduce significant bias in RNA-Seq quantification and also allows parasite transcriptome analysis in mixed RNA samples. This method presents better performance, measured by Pearson correlation, than the recently published PCR-based T. brucei mRNA amplification , enabling better transcriptome quantification, especially for samples with low amount of RNA or host-parasite mixtures.
Parasite culture: T. cruzi Dm28c epimastigotes  were cultured at 28 °C in LITB medium supplemented with 10% heat-inactivated fetal bovine serum (FBS) as previously described . Tissue culture-derived trypomastigote forms were obtained by infection of cultured Vero cell (ATCC® CCL-81™) at 37 °C in a humidified 5% CO2 atmosphere using a multiplicity of infection (MOI) of 10 parasites per host cell. Trypomastigotes were recovered after 4 days of infection, in the cell burst peak.
Fluorescence activated cell sorting (FACS)
Flow cytometry and cell sorting experiments were performed in a FACSAriaII (Becton-Dickinson, San Jose, CA, USA) using a 85 μm nozzle and 45 psi sheet pressure. A total of 10,000 events were acquired in the regions previously identified as corresponding to T. cruzi cells . For epimastigote cell sorting, a FSC-A vs SSC-A gate comprising about 90% of all singlets were used. One hundred thousand (105) epimastigotes were sorted directly to 700 μl of RNA extraction buffer (buffer RLT of Qiagen RNeasy kit) kept under 4 °C during sorting. Total sorting volume was ~200 μl. To the resulting cell lysate (900 μl), 250 μl of 100% ethanol was added and RNA was purified by RNA clean-up protocol of Qiagen handbook with additional on-column DNase digestion step.
Total RNA was extracted from T. cruzi epimastigote and trypomastigote forms (5 × 108 cells) with the RNeasy kit (Qiagen, Hilden, Germany), according to the manufacturer’s instructions, with an additional on-column DNase digestion step. Total RNA from HeLa cells were obtained from MessageAmp™ II aRNA Amplification Kit from Life Technologies® (#AMB1751-5). RNA integrity was assessed on an Agilent 2100 Bioanalyzer with RNA 6000 Nano LabChip kit, according to the manufacturer’s instructions.
Polyadenylated RNA were purified from at least 50 μg of total RNA using PolyA + Track® mRNA Isolation System III from Promega® (#Z5300), according to the manufacturer’s instructions.
All mRNA amplification reactions were done using reagents from MessageAmp™ II aRNA Amplification Kit from Life Technologies® (#AMB1751-5). For classic Eberwine amplification method, first-strand cDNA was synthesized using a T7oligo(dT) primer that contains a T7 RNA polymerase promoter upstream to the poly-T tract; this promoter will further direct the in vitro mRNA synthesis (amplification). First, 100 ng of total RNA (in 11 μl) were mixed with first-strand cDNA reaction mix containing 1 μl of T7oligo(dT) primer, 2 μl first-strand buffer, 4 μl dNTPs, 1 μl RNase inhibitor and 1 μl ArrayScript Reverse Transcriptase and incubated for 2 h at 42 °C. Second-strand cDNA was then synthesized for 2 h at 16 °C using the total 20 μl first-strand product plus 80 μl of second strand mix containing 63 μl water, 10 μl second-strand buffer, 4 μl dNTPs, 2 μl DNA polymerase and 1 μl of RNase H. Double-strand cDNA was then purified using PureLink® PCR Micro Kit from Life Technologies (#K310250) according to manufacturer’s instructions. Eluted cDNA was adjusted to 16 μl and used as template for amplification reaction in a total volume of 40 μl containing 16 μl NTPs, 4 μl amplification buffer and 4 μl of T7 RNA polymerase; in vitro transcription took 14 h at 37 °C. Amplified RNA (aRNA) was then purified using RNeasy MinElute CleanUp Kit (Qiagen, #74204), according to manufacturer’s instructions.
For specific amplification of spliced leader containing mRNA, we used a custom designed primer complementary to the last 21 bases of T. cruzi spliced leader (in bold) with an upstream T7 RNA polymerase promoter (in lower case) (5′-ggccagtgaattgtaatacgactcactatagggaGGCGGTACAGTTTCTGTACTATATTG-3′), which we named T7SL. In this case, T7SL was used for second-strand cDNA synthesis, while first-strand cDNA was produced using random primers, according to the second round amplification protocol of MessageAmp™ II aRNA Kit. Template total RNA was adjusted to 10 μl, mixed with 2 μl of “second round primers” (random primers) and incubated at 70 °C for 10 min followed by snap cooling on ice to allow annealing of random primers to the RNA. Then, 8 μl of the reverse transcription master mix was added and first-strand cDNA was synthesized at 42 °C for 2 h. RNase H was added (1 μl) and the sample incubated at 37 °C for 30 min to specifically degrade the remaining RNA. T7SL primer was added to a final concentration of 1 μM and the sample was incubated at 70 °C for 10 min and then placed on ice. Second-strand cDNA master mix without RNase H was added (74 μl) and the reaction incubated for 2 h at 16 °C. Double-strand cDNA purification, in vitro transcription and aRNA purification was performed as described above.
For a more stringent second-strand cDNA synthesis directed by T7SL primer, we also developed a modified protocol using Platinum® Pfx DNA Polymerase (Life Technologies, #11708013) for second-strand cDNA synthesis at 68 °C instead of 16 °C, and primer annealing at a higher temperature, when using the DNA polymerase of MessageAmp™ II aRNA Kit. In this case, after RNA degradation with RNase H, single-strand cDNA were mixed with T7SL primer (1 μM) and Platinum® Pfx DNA polymerase mix (5 μl 10× Pfx amplification buffer, 1 μl 50 mM MgSO4 and 1 μl polymerase) in a final volume of 50 μl. Sample was placed on a thermal cycler as following: 95 °C for 3 min (denaturing), 55 °C for 10 min (T7SL annealing) and 68 °C for 30 min (second-strand cDNA synthesis). cDNA purification, in vitro transcription and aRNA purification was performed as described above.
RNA quantification and length distribution assessment
Amplified RNA was quantified using Qubit RNA HS Assay Kit (#Q32852) in a Qubit® 2.0 fluorometer (Life Technologies). For length distribution analysis, aRNA concentration was adjusted to 1 ng μl−1 and analyzed on an Agilent 2100 Bioanalyzer with RNA 6000 Pico LabChip kit, according to the manufacturer’s instructions.
Sample preparation and sequencing
Negative control aRNA samples, generated from pure HeLa RNA, were sequenced in Omega Bio-Tek Inc. (Norcross, GA, USA) using the Illumina 2 × 100 paired-end sequencing method. All other samples were sequenced in the DNA Sequencing platform of Instituto Carlos Chagas (Curitiba, PR, Brazil) using SOLiD™ 1 × 50 single-end sequencing method.
For SOLiD™ sequencing, Total RNA-Seq Kit (Life Technologies, #4445374) was used accordingly to manufacturer’s instructions. Briefly, RNA was fragmented using RNase III and cleaned up with Invitrogen RiboMinus™ Concentration Module. Strand-specific adaptors were hybridized and ligated to both extremities of RNA in an overnight reaction, followed by first-strand cDNA synthesis with reverse transcriptase. cDNA was purified using MinElute® PCR Purification Kit (Qiagen) and gel-based selected for size using a Novex TBE-Urea 6% gel (#EC6865bOX). On-gel cDNA was amplified by PCR using SOLiD™ RNA barcoding kit primers (Life Technologies, #4427046 and #4453189) and purified with Invitrogen PureLink® PCR Micro Kit. Amplified DNA was quantified with Qubit® dsDNA HS Assay Kit on a Qubit® 2.0 fluorometer. Equal masses of each sample, containing specific barcodes, were pooled together and the mixture used as DNA template for emulsion PCR on Applied Biosystems SOLiD® EZ Bead™ E80 system (#4453095). After 3′ end modification with terminal transferase, template beads were deposited onto glass slides and libraries sequenced on a SOLiD™ 4 system using multiplex fragment sequencing protocol that generates about 700 millions 50 bases short reads per run.
Illumina 2 × 100 paired-end RNA-seq reads were aligned to the T. cruzi Dm28c genome (GenBank accession number MBSY00000000) and human genome (version 19) using Bowtie2 v. 2.3.0  in local mode, within three different settings: using default parameters, −-very-fast-local and --very-sensitive-local. Only reads mapped with an score higher than 180 were used for posterior analyses.
SOLiD 1 × 50 single-end reads were aligned using SHRiMP2 v. 2.2.3  with the following arguments: --strata -h 80% --local --max-alignments 1000, in order to consider all top scoring multiple alignments of each read since the T. cruzi genome is highly repetitive [37, 38]; multiple alignments reads were further randomly attributed to a single aligned position. Only reads mapped with an score higher than 300 were used for posterior analyses.
For RNA-Seq gene coverage analysis, genes were divided in 100 equally proportional bins and the number of reads aligned to each bin were summed up from the SAM file  using custom Perl scripts; percentage of maximum coverage were calculated for each bin considering the higher count bin (for each sample) as 100%. In order to exclude redundancy in T. cruzi gene annotation and read counts, clusters of orthologous genes were determined by MCL algorithm v. 10-148, with an inflation value –I of 5.0 , based on nucleotide sequence similarity produced by BLAST+ v. 2.4.0 analysis and named hereafter Supra Genes (SG) (Additional file 1). SG read counts were obtained by summing the number of reads aligned to each coding sequence (CDS) of all members of SGs using the Perl programming language script. Low expressed Supra Genes were excluded from the analysis if counts per million (CPM) was smaller than one for at least two samples. Across sample SG count normalization was done by the TMM method  and differentially expressed Supra Genes detected by using the R v. 3.2.3 Bioconductor v. 3.5 package edgeR v. 3.12.0 , using a negative binomial model and an exact test based on quantile-adjusted conditional maximum likelihood (qCML) method. Alignment visualization was done in IGV v. 2.3.92 software . Pearson correlations were calculated in R v. 3.3.1 using log2 of SG read counts.
To compare our transcriptome data to that from Li and collaborators , we took the raw read counts results from the SAM file (Table S6 from ) and correlated the CL Brener gene IDs to Dm28c Supra Gene IDs by BLAST+ v. 2.4.0 blastn search using the Esmeraldo haplotype data as query (Additional file 2), with default parameters except an e-value threshold of 1 × 10−5. The present results and Li and collaborators  data were processed together using our pipeline of RNA-Seq counts normalization, differential expression detection and correlation analysis as described above.
All RNA-Seq data have been deposited in NCBI’s Gene Expression Omnibus and are accessible through GEO Series accession number GSE94766.
Establishment of T7SL IVT method and comparison to T7oligo(dT) IVT
RNA amplified by T7SL IVT showed a smooth length distribution, consistent with a global amplification of all mRNAs of the parasite (Fig 1b). Although T7oligo(dT) showed a similar pattern, a RNA spike approximately at 600 nt was always present (Fig. 1b). When compared to purified poly(A) + RNA from T. cruzi, both amplification methods showed a smaller median RNA size (Fig. 1b and c). As discussed below, this is probably due to the 5′ and 3′ bias in T7SL and T7oligo(dT) amplification methods, respectively. The aRNA yield was similar for both methods (Fig. 1d); starting with 100 ng of total RNA, T7SL and T7oligo(dT) IVT generate an average of 14 and 21 μg of aRNA, respectively. Considering an empiric estimation that 3% of total parasite RNA correspond to mRNA (3 ng on a 100 ng total RNA sample), these yields are equivalent to 4718 and 7000-fold mass amplification of input mRNA, respectively.
RNA-seq reads were aligned to our first version of T. cruzi Dm28c genome produced using Pacific Bioscience Technologies (PacBio) . As this third generation sequencing technology generates reads in average longer than 8 kb, this draft version is of higher quality than the usual CL Brener reference genome, lowering the number of contigs from 32,746 to 1030, increasing the size of the larger contig from 0.26 MB to 1.54 MB and increasing contig N50 from 15 kb to 133 kb . These improvements, as well as the fact that our mRNA samples were derived from Dm28, improve the alignment of RNA-Seq reads (68% of the reads aligned in a single position in Dm28c genome, instead of 56% in the CL Brener genome). There is another version of the Dm28c genome available at GenBank (AYLP00000000.1), but it has lower quality, as this version was sequenced using 454 technology and hence has a more fragmented contig distribution, with its larger contig having only 0.09 MB, besides the higher error rate in homopolymer tracts typical of pyrosequencing. This Whole Genome Shotgun project has been deposited at DDBJ/ENA/GenBank under the accession MBSY00000000. The version described in this paper is version MBSY01000000.
After aligning the RNA-Seq reads to T. cruzi genome, per Supra Gene read counts were computed. When comparing the technical replicates (same total RNA sample used as template for two independent amplification reactions), both IVT amplification methods have a very high correlation between technical replicates (Fig. 2c). After summing up read counts for IVT technical replicates, we also assessed the biological degree of reproducibility, using two independent biological samples for the same parasite stages, epimastigote and trypomastigote. For both IVT methods, biological variance is similar to the technical variance (Fig. 2d). Interestingly, the T7SL method presented a slightly higher average biological correlation in comparison to T7dT and poly(A) + RNA (Additional file 4: Table S1).
We computed the read count fold change (log2) for all parasite genes, and showed that epimastigote to trypomastigote fold changes are similar for all three methods (Fig. 3b). Higher correlation were obtained when comparing T7SL IVT to either one of the other methods, in comparison to T7dT versus poly(A) + RNA correlation, suggesting that T7SL is a reliable representation of fold changes distribution.
We compared the recently published T. cruzi transcriptome data  to present results. Normalized read counts showed low correlations when comparing same parasite stages (r = 0.26 to 0.56) (Additional file 4: Figure S3a), but this can be explained by several reasons, including (i) different parasite strains, (ii) distinct genome used for RNA-Seq alignments and (iii) distinct laboratorial procedures. However, when comparing the expression fold changes between epimastigotes to trypomastigotes (Fig. 3b and Additional file 4: Figure S3b), the correlation slightly improves (r~0.6), showing that, apart from distinct methods used, the biological meaning of both works was similar. When comparing epimastigote to trypomastigote DEGs (FDR < 0.01), about 64% (1463 of 2298) of significant genes detected using Li and collaborators data  were also detected in the present work (Additional file 4: Figure S4a). Furthermore, present T7SL data detected 952 epimastigote to trypomastigote DEGs that were not detected in Li data (Additional file 4: Figure S4a). When considering the most significative DEGs for T7SL method (FDR < 0.01, fold change > 2; 872 genes), about 53% (461) was also detected in Li data (Additional file 4: Figure S4b). Apart from the peculiarities of each method, 98% (448 of 461) of this common DEGs showed the same fold change direction (Additional file 4: Figure S4b-i and S4b-iv). Present data corroborate previous work on the T. cruzi steady state transcriptome [44, 47], showing the upregulation of several metabolism enzymes in epimastigote stage (Additional file 4: Figure S4b-i; Additional file 6) and several MASPs and trans-sialidases in trypomastigotes (Additional file 4: Figure S4b-iv; Additional file 6).
T7SL IVT improves parasite transcriptome analysis in host-parasite RNA mixtures
We further tested the T. cruzi transcriptome quantification in the following T. cruzi/HeLa RNA mixtures: (i) 900 ng HeLa plus 100 ng Epi (10% parasite RNA), (ii) 990 ng HeLa plus 10 ng Epi (1% parasite RNA) and (iii) two mixtures containing 0.1% of parasite RNA (4995 ng HeLa plus 5 ng Epi and 9990 ng HeLa plus 10 ng Epi). The 0.1% mixtures corresponds to about 106 human cells (around 5 μg of total RNA) together with 104 epimastigotes (around 5 ng of total RNA, see Additional file 4: Table S2). Bioanalyzer analysis demonstrates that the 0.1% parasite RNA mixture generate aRNA with length distribution profile resembling the one from T. cruzi aRNA, but with a less smooth pattern (Fig. 4a).
We conducted RNA-Seq experiments on aRNA generated from mixture samples and the resulting reads were aligned to a T. cruzi-H. sapiens concatenated genome in order to observe the percentage of aligned reads with best match on the parasite or human genome. For the 10% parasite RNA sample, around 94% of aligned reads were best aligned to T. cruzi genome (Fig. 4b); for the 1% parasite RNA sample, more than 60% of aligned reads showed a best match to the parasite genome; and when decreasing the parasite RNA proportion to only 0.1% of total mass, around 10% of the aligned reads have a best match to the parasite genome. Although this means a lower sequencing coverage for samples with low proportion of parasite mRNA, it is a significant reduction in read waste when comparing to mixture RNA samples without amplification, as 10% of aligned reads on a 0.1% original sample represents a 100-fold enrichment for parasite RNA. For HeLa pure samples, only a very small proportion of aligned reads (~0.02%) showed a better alignment to the T. cruzi genome (detailed below).
When comparing read counts per Supra Gene for pure and mixed samples using T7SL Pfx IVT aRNA (Fig. 4c), the parasite transcriptome quantification was very reproducible with Pearson correlations above 0.94 even for the 0.1% parasite RNA samples (Fig. 4c). Finally, we visualized read counts scatter plots when comparing epimastigote poly(A) + RNA to epimastigote T7SL Pfx IVT aRNA generated in the most diluted samples (Fig. 4c). When using the optimized protocol for T7SL Pfx IVT from 100 ng of pure parasite RNA, Pearson correlation was 0.78 to Poly(A) + RNA. Interestingly, aRNA generated from host-parasite RNA mixture containing only 0.1% of parasite RNA had a similar Pearson correlation (r = 0.77). This result indicates that T7SL amplification method performs better, in general, than PCR based mRNA amplification, since a recent study in T. brucei based on this technique reported a correlation of 0.38  between poly(A) + RNA and SL-based PCR amplified RNAs. Scatter plots of epimastigote to trypomastigote fold changes showed that when the percentage of epimastigote RNA on the mixture diminishes, the correlation to poly(A) + transcriptome decreases from 0.67 (10% epimastigote mass) to 0.50 (0.1% epimastigote) indicating a higher impact of stochastic factors when the parasite RNA input is too low (Fig. 4d).
Minimal RNA mass input for optimal T7SL IVT
We sequenced T7SL IVT aRNA generated from 100 to 3 ng of input total RNA and also from 100,000 sorted epimastigotes (around 68 ng of input total RNA). Results showed that even when pushing down the input mass to 3 ng, the transcriptome quantification shows a very good correlation (r = 0.90) to Poly(A) + RNA. In general, the global RNA quantification distribution is maintained, hence even with few nanograms of RNA (or 100,000 sorted parasites) we can quantify the parasite transcriptome with satisfactory precision (Pearson correlation higher than 0.9, Fig. 6c).
T7SL enable DEGs detection independently of parasite RNA mass input and purity
Next, we compared the identified DEGs in trypomastigote versus epimastigote samples, using five distinct comparisons: (i) purified poly(A) + RNA obtained for both parasite stages (gold standard RNA-Seq method), (ii) pure epimastigote aRNA produced from 100 ng of total RNA, (iii) the most limiting mass used for T7SL IVT (3 ng of epimastigote RNA), (iv) the lower percentage of epimastigote RNA in a host-parasite mixture (0.1% of parasite RNA) and (v) epimastigote RNA obtained by sorting of 105 parasites. Comparisons (ii) to (v) were against T7SL IVT trypomastigote aRNA. After independent calculation of DEGs for each comparison, we compare the list of DEGs (considering an FDR of 1%) and plotted the overlap of DEGs in an Euler diagram (Fig. 7c). A vast majority of putative DEGs detected by gold standard poly(A) + RNA method were also detected by T7SL IVT method, even when using a limiting input mass of 3 ng. In general, T7SL IVT has reliable performance in DEGs detection even in the most extreme conditions of low input or low percentage of parasite RNA.
Direct RNA-Seq analysis of parasites in their hosts is a difficult task due to the low amount of RNA available to be extracted directly from infected tissue in comparison to the higher amount of co-purified host RNA. When the parasitemia is relatively high, typical poly(A) + RNA-Seq can be used to simultaneously capture the parasite and host cell transcriptomes, as recently showed for T. cruzi [44, 48], T. brucei  and Leishmania . Li and collaborators  analyzed the T. cruzi transcriptomic modulation associated with the establishment of intracellular T. cruzi infection. In the initial times of infection (4 to 24 h), when the number of parasites to host cells is not higher than 1:1, generally less than 5% of the mixed host-parasite read pool mapped to the T. cruzi genome, reflecting the larger proportion of the host cell transcriptome. Hence, in cases where the relative number of parasites to host cells is much smaller than 1:1, the fraction of reads mapping to parasite genome will be too small. In this scenario, the parasite specific mRNA amplification method showed here is of great utility. The T. cruzi/HeLa mixed samples showed that it is possible to quantify the parasite transcriptome even when only 0.1% of total RNA mass correspond to parasite RNA, which is similar to a mixture of 106 human cells and 104 T. cruzi epimastigotes. In this case, typical poly(A) + RNA-Seq will not efficiently capture the parasite transcriptome and the present method is especially useful for low parasitemia samples or low input parasite total RNA. Considering that all mRNAs from trypanosomatids possess the same specific sequence at the 5′ region, the spliced leader or mini-exon, they could be selectively amplified through a SL-bearing primer. Although the SL trapping method theoretically provides a means of analyzing parasite transcriptome in complex RNA samples (such as host-parasite mixtures) , this hypothesis was only recently addressed by Mulindwa and collaborators . Using a method similar to SL trapping, these authors synthesize T. brucei-specific cDNA by priming with a SL-specific oligo, followed by PCR amplification using nested primers to evaluate the T. brucei transcriptome in host-parasite RNA mixtures consisting of only 0.1% of T. brucei RNA. Their PCR based amplification of SL-containing mRNAs allowed comparison of different samples as long as they were all treated in the same way, but introduced significant bias when comparing to gold standard poly(A) + RNA-Seq .
Our method uses a different procedure, IVT for RNA amplification, as it linearly amplifies original mRNAs . As already pointed out by Mulindwa and collaborators , SL priming followed by PCR-based cDNA amplification distorted relative abundances of the cDNA products and cannot be used, by itself, to measure absolute mRNA levels. Although the present T7SL IVT method also showed some bias on transcriptome quantification, the amplification distortion was smaller. When comparing transcriptome read counts distributions, the T7SL IVT method showed higher correlations to poly(A) + RNA (r = 0.77, for the 0.1% mixture) in comparison to PCR-based method (r = 0.34) . This improvement is important to increase the ability to detect true changes in the expression profile, specially with low amount RNA or low ratio of parasite to total RNA samples. Furthermore, if comparing samples produced by the same T7SL IVT protocol, the correlation was very good even when using complex mixed samples or low input RNA, indicating that the bias created by the method, when compared to poly(A) + RNA, are gene-specific and not influenced by the RNA sample complexity.
The higher ability to detect epimastigote versus trypomastigote DEGs when using samples produced by the T7SL IVT method corroborate the idea that linear RNA amplification enables a more reproducible transcriptome quantification [12, 13, 17]. When analyzing samples with very low amount of parasite RNA, the lower proportion of parasite reads can be resolved by a larger amount of sequencing data, obviously with significant increase in cost. Even though T7SL IVT amplifies host RNA, present method showed at least 100-fold enrichment of parasite mRNA on resulting RNA-Seq reads that can be directly translated to a similar decrease in sequencing costs.
Other useful characteristic of the T7SL IVT method is the 5′ bias of the generated reads. Due to the fact that 5′-UTR regions are significantly smaller than 3′-UTR , usually 30 to 70 nt, a higher proportion of reads will align inside CDS regions. These regions are more diverse than UTRs and hence their correct mappability is higher in general.
We concluded that the T7SL IVT method has several advantages for those researchers preparing trypanosomatid RNA-Seq libraries whenever the parasite RNA mass is a concern. As long as all samples are treated equally, T7SL IVT allows a powerful detection of putative DEGs in complex samples, including low input and host-parasite mixtures (especially for low parasitemia samples).
Although we developed T7SL IVT method using T. cruzi total RNA as template for amplification, this method can be easily adapted for any trypanosomatid species by only modifying the T7SL oligonucleotide. T7SL IVT opens new perspectives for trypanosomatid studies, specially parasite transcriptome analysis directly in their hosts.
We thank Paulo Rodrigo Claure Arauco for technical support on RNA sequencing.
Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq), Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES).
Availability of data and materials
All RNA-Seq data have been deposited in NCBI Gene Expression Omnibus and are accessible through GEO Series accession number GSE94766. The genome data used in this study is deposited at DDBJ/ENA/GenBank under the accession MBSY00000000.
CMP, MAK and DPP developed the key idea. RLK and CMP did all the experiments. RLK and CMP analyzed the RNA-Seq data. All authors wrote, read, and approved the final manuscript.
The authors declare that they have no competing interest.
Ethics approval and consent to participate
Consent for publication
- 10.Morris J, Singh JM, Eberwine JH. Transcriptome analysis of single cells. J Vis Exp. 2011;50:e2634.Google Scholar
- 13.Polacek DC, Passerini AG, Shi C, Francesco NM, Manduchi E, Grant GR, Powell S, Bischof H, Winkler H, Stoeckert CJ Jr, et al. Fidelity and enhanced sensitivity of differential transcription profiles following linear amplification of nanogram amounts of endothelial mRNA. Physiol Genomics. 2003;13(2):147–56.CrossRefPubMedGoogle Scholar
- 24.Nilsson D, Gunasekera K, Mani J, Osteras M, Farinelli L, Baerlocher L, Roditi I, Ochsenreiter T. Spliced leader trapping reveals widespread alternative splicing patterns in the highly dynamic transcriptome of Trypanosoma brucei. PLoS Pathog. 2010;6(8):e1001037.CrossRefPubMedPubMedCentralGoogle Scholar
- 33.Kessler RL, Gradia DF, Pontello Rampazzo Rde C, Lourenco EE, Fidencio NJ, Manhaes L, Probst CM, Avila AR, Fragoso SP. Stage-regulated GFP expression in Trypanosoma cruzi: applications from host-parasite interactions to drug screening. PLoS One. 2013;8(6):e67441.CrossRefPubMedPubMedCentralGoogle Scholar
- 50.Fernandes MC, Dillon LA, Belew AT, Bravo HC, Mosser DM, El-Sayed NM. Dual Transcriptome profiling of Leishmania-infected human macrophages reveals distinct reprogramming signatures. MBio. 2016;7(3):e00027-16.Google Scholar
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.