Background

Cryptosporidiosis in humans is caused primarily by Cryptosporidium parvum and C. hominis (Phylum Apicomplexa). Infection with these protozoans is the second-most frequent cause of diarrhea in infants living in developing nations [1] and is relatively common in immunocompromised individuals [2, 3]. As typically observed with other coccidia, rapid multiplication of the parasite in the intestinal epithelium compromises intestinal function and leads to diarrhea and malabsorption. Although numerous publications have described modifications of the original method for culturing Cryptosporidium [4, 5], our ability to grow these parasites in cell monolayers remains unsatisfactory. Our knowledge of the interaction between host cell and parasite is primarily based on the annotation of the Cryptosporidium genome, which has revealed the absence of several biosynthetic pathways and inferred the dependence of the replicating parasite on host cell metabolites [6].

Studying the interaction of Cryptosporidium parasites with the host cell remains a difficult undertaking. Parasite development is not synchronous, the proportion of infected monolayer cells is variable and difficult to measure. As a consequence, compared to the oocyst stage, intracellular stages have infrequently been studied, particularly later developmental stages. The transcriptional response of cell monolayers to the presence of C. parvum meronts has been investigated with microarrays and reverse-transcription (RT) PCR [7,8,9,10,11,12]. Studies in monolayers of human HCT-8 cells infected with C. parvum have uncovered morphological changes reminiscent of apoptosis [13, 14], reported heat-shock and inflammatory response [7], cytoskeleton modifications [15] and modifications of the host cell membrane [16]. RNA-Seq has recently been used to analyze the C. parvum transcriptome in cell monolayers and in experimentally infected calves, but to date no analysis of these data appears to have been published. Here, we report on the analysis of the transcriptional response of pig intestinal epithelial cells to the initial stage of C. parvum merogony and compare functional properties of the host and parasite transcriptome in the early phase of merogony.

Methods

Parasites and cell lines

Cryptosporidium parvum oocysts

Fecal samples from diarrheic calves raised in Woodstock, Connecticut, were screened for the presence of Cryptosporidium oocysts using acid-fast stained fecal smears. One sample with a high concentration of oocysts (3 × 107 oocysts/ml feces) was selected. Oocysts were extracted on a density gradient of 15–30% Nycodenz (Alere Technologies, Oslo, Norway) as described previously [17]. Oocyst concentrations were determined using a hemocytometer at 400× magnification. The species of this isolate was confirmed using BLAST analysis of sequences obtained as described in the following paragraph. Of 10 randomly selected 101-nt RNA-Seq reads obtained from one of the infected monolayers and which mapped to the C. parvum IOWA genome, 8 sequences were 100% identical to C. parvum sequences in the NCBI nucleotide collection, one sequence was 100% identical to C. parvum and to C. hominis, and for one sequence no significant Cryptosporidium hits were found. Based on this analysis, and consistent with the host origin of the oocysts, we conclude that the isolate used in these experiments is C. parvum. The transcriptome of oocysts of isolate TU114 [18] was analyzed using RNA-Seq as described below.

Infection of cell monolayers

Monolayers of pig jejunal epithelial cells (IPEC-J2) [19] were grown to near-confluence in four 75 cm2 flasks using DMEM/F12 media (Life Technologies) with 5% fetal bovine serum. Oocysts were surface-sterilized with 10% bleach. Monolayers were infected [13, 14] with a dose equivalent to 1.4 × 105 oocysts/cm2, which corresponds to approximately 1 oocyst/cell. Cultures were incubated at 37 °C in a humidified incubator with 5% CO2 for 24 h.

Immunofluorescence

Immunofluorescence was used to confirm the infection of the cell monolayers. Infected and control monolayers were washed 3× with PBS to remove unexcysted oocysts. Monolayers were then fixed with methanol at room temperature for 15 min. Following fixation, monolayers were washed 3 times with PBS and blocked with DMEM medium supplemented with 5% fetal bovine serum at room temperature for 15 min. Following one more wash with PBS, 100 μl of 5 μg/ml monoclonal antibody 2E5 (a gift from Dr. Abhineet Sheoran) conjugated with Fluorescein-5-Isothiocyanate (FITC) was added to each monolayer and the plates incubated at room temperature for 30 min. Antibody 2E5 reacts with intracelluar stages of C. parvum. Plates were dried in the dark and read with an inverted epifluorescent microscope using a 40× objective.

Molecular biology methods

Total RNA was extracted from infected and uninfected cells with a PureLink RNA Mini Kit (Life Technologies). Samples were lysed and homogenized in the presence of guanidinium isothiocyanate. After homogenization, ethanol was added to the sample. The sample was then processed through a Spin Cartridge containing a clear silica-based membrane provided in the kit, to which the RNA binds. Impurities were removed by subsequent washing with the wash buffers provided. Purified RNA was eluted in RNAse-free water. DNA was removed using DNase (DNA-freeTM, Life Technologies). The quality of the RNA was assessed by reading the 260/280 nm absorbance ratio. The RNA Integrity Number (RIN) was determined using an Agilent Bioanalyzer 2100. An Illumina (TruSeq Stranded RNA library) kit was used to make the cDNA libraries from RNA extracted from four infected and four uninfected 75 cm2 cell monolayers harvested 24 h post-infection. The eight cDNA libraries were subjected to cluster generation and single-end 100-nucleotide sequencing on an Illumina Hi-Seq 2500 at the Tufts Genomics core facility (tucf.org). Sequencing data were deposited in the European Nucleotide Archive under project accession number PRJEB17685.

Gene expression analysis

The S. scrofa reference genome and annotation (susScr3) was downloaded from iGenome (http://support.illumina.com/sequencing/sequencing_software/igenome.html). The C. parvum IOWA isolate [20] genome and annotation (version 34) was downloaded from the Cryptosporidium Genomics Resource database CryptoDB.org [21]. Each RNA-Seq sample was randomly subsampled to 7 million reads to obtain a dataset which could more easily be processed with available computational resources. Sequences were converted from FASTQ to FASTA format and subsampled in mothur [22]. Reads were mapped to the pig genome using HiSat2 [23] as implemented in Galaxy (usegalaxy.org) [24]. Reads that did not align to the pig genome were subsequently mapped, also with HiSat2, to the C. parvum IOWA genome to estimate the proportion of parasite transcripts in relation to the combined host-parasite transcriptome. A table of FPKM (fragments per kilobase of transcript per million mapped reads) for the RNA-Seq data which mapped to the S. scrofa genome was created with Cufflinks [25]. Cufflinks returned FPKM values for 4939 S. scrofa genes. The correlation between FPKM values from replicate samples was visualized as shown in Additional file 1: Figure S1. Differentially expressed genes were identified using DESeq2 [26] as implemented in Galaxy using one HiSat2 output file for each of the eight transcriptome samples. Principal Coordinates Analysis (PCoA) was performed with GenAlEx [27]. Pairwise distances between samples were calculated using the SSR metric. This distance was calculated by adding the square of the difference in FPKM between two samples over all genes. Alternatively, the Euclidian distance was used. Analysis of Similarity (ANOSIM) [28] was used to test the significance of the clusters revealed by PCoA. ANOSIM was run in mothur.

Program LefSe as implemented in Galaxy at huttenhower.sph.harvard.edu/galaxy/ was used for Linear Discriminant Analysis (LDA) to identify marker genes, i.e. genes that best explain the difference between infected and control transcriptome samples. LDA was applied to a table of 8 samples × 4939 S. scrofa genes. The 8 × 4939 fields of the table represented FPKM values, where zero indicated that no sequence mapped to a particular gene.

Gene function and enrichment analyses were performed with DAVID [29]. The False Discovery Rate method [30] and Bonferroni correction was used to identify differentially transcribed genes or enriched functions. Shannon diversity is defined as -Σ pi * ln(pi), where the sum is over all genes and pi is the proportion of FPKM value of gene i. pi was calculated by dividing each gene’s FPKM by the sum of all FPKM values in a sample, such that Σ pi over all genes is equal to 1. Diversity calculations were performed in Microsoft Excel.

Results

Host cell and parasite transcriptome

The number of sequences from four infected and four control monolayers mapping to the S. scrofa and C. parvum genome is shown in Table 1. RNA-Seq data from C. parvum oocysts were also mapped to the two genomes as a quality control. As expected, the proportion of oocyst reads mapping to the S. scrofa genome was close to zero (1037/7 × 106 = 0.014%), whereas 83.7% of oocyst reads mapped to the C. parvum genome (Table 1). From the mapping statistics, it is possible to estimate the proportion of parasite transcripts in relation to the host transcriptome. According to Table 1, the average number of RNA-Seq reads that aligned uniquely and > 1 time to the C. parvum genome is 117,397 (n = 4; SD = 7416), which is 2.20% of the number of reads aligning to the S. scrofa genomes (5,322,039; n = 8; SD = 60,029). The extent of infection of IPEC-J2 cell monolayers at 24 h post-infection was evaluated using immunoflourescence (Additional file 2: Figure S2). The immunofluorescence pattern indicated that about 50% of cells were infected. Based on these data, we estimate that 4.40% (2.20% × 2) of the transcripts in infected cells originate from C. parvum. Given that the S. scrofa genome counts about 12 times more genes than the C. parvum genome (46,161 vs 3880 genes), and the number of pig transcripts was 45.3 time higher than C. parvum transcripts, the host cell transcript is approximately four times more abundant on a per-gene basis than the parasite transcriptome. Mean Shannon diversity for the host cell transcriptome was 6.328 (SD = 0.0498, n = 8). For the parasite transcriptome, diversity was significantly lower (mean = 6.265, SD = 0.0289, n = 4; t = -2.299, P = 0.044).

Table 1 Summary of sequence reads mapping to the genome of Sus scrofa and Cryptosporidium parvuma

We compared the host cell and parasite transcriptome in relation of function. The results of a function enrichment analysis for the 100 C. parvum and 100 S. scrofa genes with the highest mean FPKM is shown in Tables 2 and 3, respectively. The results show that the parasite transcriptome encodes primarily functions annotated as ribosome biogenesis and translation. The analogous analysis to identify enriched functions was performed with the 100 S. scrofa genes with highest mean FPKM (Table 3). This analysis reveals a similar pattern of enriched ribosomal functions. However, in contrast to the C. parvum transcriptome, other enriched functions such as “acetylation”, “Ubl conjugation” and functions related to the extracellular compartment were also found.

Table 2 Significantly enriched functions in the C. parvum intracellular transcriptome based on the analysis of 100 genes with the highest FPKM values
Table 3 Significantly enriched functions in the host cell transcriptome based on the analysis of 100 S. scrofa genes with the highest FPKM values

The results of the enrichment analysis of highly expressed genes are consistent with the slightly, but significantly, higher diversity of the host cell transcriptome described above. Whereas in the parasite transcriptome functions relating to ribosome or translation are enriched, in the host other functions were also enriched. The difference in function diversity is represented in Fig. 1. In the bar graphs genes are ranked from left to right in order of diminishing FPKM. Genes encoding ribosome-related functions are represented with light grey. The juxtaposition of the host and parasite transcriptome clearly illustrates the higher proportion of ribosomal functions in highly expressed genes in the parasite transcriptome, as compared to the transcriptome of the host. Moreover, the plots also show the difference in FPKM diversity which is apparent as a more negative slope in the C. parvum FPKM rank-abundance plot, as compared to the S. scrofa plot.

Fig. 1
figure 1

Rank abundance analysis of host and parasite transcriptome. In each graph, host and parasite genes are ranked from left to right in order of diminishing FPKM. Each vertical bar represents a gene. Grey and black bars represent genes encoding ribosomal functions and non-ribosomal functions, respectively. The 500 genes with the highest FPKM are represented in the two bar graphs on the left: top, S. scrofa; bottom, C. parvum. To convey a clearer view of the function of highly transcribed genes in relation to ribosome-related functions, the 100 genes with the highest FPKM value are shown right. Note the high proportion of genes encoding ribosomal function in C. parvum as compared to the host cell transcriptome. The steeper slope of the ranked C. parvum FPKM values is consistent with a lower Shannon diversity identified as described in the text

Differentially expressed S. scrofa genes in infected and control monolayer cells

Having gained insight into the profile of the host cell and parasite transcriptome in infected cells, we investigated whether the host cell transcriptome is affected by the infection with C. parvum. PCoA was used to visualize the global difference between the transcriptome of infected and control IPEC-J2 monolayers. PCoAs based on Euclidian distance or on SSR distance [31] gave a similar clustering of infected and control samples. Figure 2 shows the plot based on SSR distance. As apparent from the PCoA, Analysis of Similarity (ANOSIM) confirmed that clustering according to the experimental treatment (infected vs control) is significant (R = 0.864, P = 0.028).

Fig. 2
figure 2

Principal Coordinates Analysis of FPKM data from eight cell monolayers. Analysis is based on FPKM values from 4939 S. scrofa genes. Pairwise distances were calculated using the SSR metric. Each black and red symbol represents the transcriptome of an uninfected and an infected monolayer, respectively. Clustering by treatment is statistically significant according to ANOSIM

We used DESeq2 to identify S. scrofa genes which are significantly up- or downregulated in response to the infection. A total of 810 host genes were found to be differentially expressed at a FDR < 0.05 (Additional file 3: Table S1) in a comparison of four infected and four control monolayers at 24 h post-infection. The result of a function enrichment analysis of genes overexpressed in infected cells found several terms associated with cell division, possibly reflecting damage to the monolayer leading to loss of contact inhibition and resumption of mitosis (Table 4). However, in cells that remained in the monolayer, no transcriptional changes indicative of stress or apoptosis were identified.

Table 4 Significantly enriched functions in upregulated S. scrofa genes in C. parvum infected cell monolayers

The overexpression of cell cycle related pathways is consistent with a pathway analysis of the same set of up- and downregulated genes. Spliceosome and Cell cycle were the only two KEGG pathways significantly enriched in the infected samples (FDR < 0.05; Additional file 4: Table S2). These terms do not feature in the analogous pathway analysis of the genes upregulated in uninfected monolayers (Additional file 5: Table S3). Corroborating the analysis of enriched gene function, the function of 39 genes significantly overexpressed according to LEFse analysis [32] broadly overlapped with those shown in Table 4; phosphoprotein was again the highest scoring function with a 7.5-fold enrichment and an FDR value of 0.003. Acetylation, focal adhesion and glycoprotein also featured on the list of ten function terms identified by LEFse.

Discussion

The analysis of transcriptional changes in cells infected with C. parvum is providing new insights into the host response to the infection. Previous studies have shown that apoptosis of host epithelial cells is both induced and inhibited by the infection of C. parvum [12, 14, 33,34,35]. The data presented here support the latter. Understanding the role that apoptosis plays in the pathogenesis of cryptosporidiosis is relevant, because it may help explain the mechanisms leading to diarrhea and blunting of intestinal villi, which are hallmarks of cryptosporidiosis [36]. Since different studies use different cell lines, parasite dose, incubation time and analytical techniques, it is not surprising that they lead to different conclusions. IPEC-J2 cells have been reported to be capable of undergoing apoptosis [37], indicating that deficiency in apoptosis pathways are not at the root of our observations. IPEC-J2 cells originate from a pig, as opposed the more commonly used human cell lines such as HCT-8. In addition, these cells were isolated from the jejunal epithelium, as opposed to the colon, from where HCT-8 and CaCo-2 cells originate. We chose to work with IPEC-J2 cells because of our interest in eventually extending the transcriptome analyses to C. parvum development in vivo and comparing gene expression in vivo and in culture. Germ-free neonatal piglets are highly susceptible to various Cryptosporidium species [38, 39]. Moreover, the large size of the GI tract make them ideal models for comparing Cryptosporidium development in vivo and in vitro in future studies.

The identification of host cell transcripts differentially expressed in infected and control monolayers is significant given the short duration of the infection. What may explain the rapid transcriptional response is the fast rate of parasite asexual multiplication. At 24 h post-infection, some parasites may already be in second generation of merogony [4]. During merogony, when the rate of parasite replication is at its peak, demand for host cell metabolites, which the parasite is unable to synthesize, and the demand for energy is likely to be high, possibly upregulating related biosynthetic pathways in the host. The fact that such biosynthetic pathways do not feature among those most upregulated in infected monolayers (Additional file 4: Table S2) may indicate that the extra metabolite demand from the developing parasite is not sufficient to impact host cell transcription at a detectable level, or that this demand is met by the host cell through increased activity of biosynthetic enzymes, rather than by upregulating transcription. Since not all cells in a monolayer are infected, the DESeq2 analysis is likely to underestimate the extent of differential expression. Sus scrofa genes with the highest level of differential expression are up- or downregulated by about 6-fold (Additional file 3: Table S1, column log2(FC)) indicating that in individual cells, differential expression could be 10-fold or higher.

Attempts to flow-sort populations of infected and uninfected to improve the resolution of differential expression analyses have been reported [12]. The drawback of this approach is that additional manipulations required to release and sort infected from uninfected cells are difficult to standardize and minor differences in processing could potentially affect transcription or mRNA turnover. The limitations of the culture systems also precludes us from distinguishing between transcriptional changes induced by multiplying intracellular meronts from secondary changes triggered, for instance, by the disruption of the monolayer. Gene ontology analysis indicates that genes involved in cell proliferation were upregulated in infected monolayers, possibly indicating a secondary effect triggered by release of cells, loss of contact inhibition and cell re-entering the mitotic cycle. Newer RNA-Seq methods making single-cell RNA-Seq possible might be a viable approach to improve the resolution of transcriptome analysis. Combining this approach with emerging cell culture techniques which support the entire life-cycle [40, 41] could provide access to later stages in the life-cycle and eventually generating a complete picture of parasite and host cell gene regulation during the life-cycle.

The upregulation of genes related to glycoproteins observed in this study was also detected in an earlier microarray analysis [12]. On the other hand, differentially expressed functions reported by others, such as structural proteins and markers of stress, were not found to be upregulated in the present study. This difference could be a consequence of our study using a different cell line, differences in the intensity of the infection, a difference in parasite virulence or perhaps to the signal being below the sensitivity of the assay. As discussed above, the different results generated by quantitative RT PCR, microarrays and RNA-Seq are not surprising. Additional research will be required to generate a validated host cell transcriptional profile in response to C. parvum infection.

Conclusions

The analysis of the transcriptome of cell monolayers infected with C. parvum and uninfected controls revealed cellular functions differentially regulated in response to the infection. However, stress- and apoptosis-related genes were not impacted. The comparison of the combined host-parasite transcriptome showed that C. parvum gene expression is less diverse and is highly enriched for genes encoding ribosomal functions.