Exploratory analysis of transposable elements expression in the C. elegans early embryo
Transposable Elements (TE) are mobile sequences that make up large portions of eukaryote genomes. The functions they play within the complex cellular architecture are still not clearly understood, but it is becoming evident that TE have a role in several physiological and pathological processes. In particular, it has been shown that TE transcription is necessary for the correct development of mice embryos and that their expression is able to finely modulate transcription of coding and non-coding genes. Moreover, their activity in the central nervous system (CNS) and other tissues has been correlated with the creation of somatic mosaicisms and with pathologies such as neurodevelopmental and neurodegenerative diseases as well as cancers.
We analyzed TE expression among different cell types of the Caenorhabditis elegans (C. elegans) early embryo asking if, where and when TE are expressed and whether their expression is correlated with genes playing a role in early embryo development. To answer these questions, we took advantage of a public C. elegans embryonic single-cell RNA-seq (sc-RNAseq) dataset and developed a bioinformatics pipeline able to quantify reads mapping specifically against TE, avoiding counting reads mapping on TE fragments embedded in coding/non-coding transcripts. Our results suggest that i) canonical TE expression analysis tools, which do not discard reads mapping on TE fragments embedded in annotated transcripts, may over-estimate TE expression levels, ii) Long Terminal Repeats (LTR) elements are mostly expressed in undifferentiated cells and might play a role in pluripotency maintenance and activation of the innate immune response, iii) non-LTR are expressed in differentiated cells, in particular in neurons and nervous system-associated tissues, and iv) DNA TE are homogenously expressed throughout the C. elegans early embryo development.
TE expression appears finely modulated in the C. elegans early embryo and different TE classes are expressed in different cell types and stages, suggesting that TE might play diverse functions during early embryo development.
KeywordsTransposable elements Caenorhabditis elegans Early embryo Embryogenesis RNA-Seq Single-cell Bioinformatics
- C. elegans
Central nervous system
Embryonic stem cells
Induced pluripotent stem cells
Long interspersed nuclear elements
Long non-coding RNAs
Long terminal repeats
LTR RNA interference
Reads per million mapped reads
Short interspersed nuclear elements
Transposable Elements (TE) are repetitive elements spread among the genomes of almost all eukaryotes . TE can be classified in transposons and retrotransposons according to their mechanism of transposition. Transposons are composed by DNA and rolling-circle (RC) elements and mobilize through a DNA intermediate, while retrotransposons are composed by Long Terminal Repeats (LTR) and non-LTR (LINE and SINE) sequences that take advantage of an mRNA intermediate for their mobilization [1, 2]. TE make up a large portion of human and murine genomes (40–45%) and despite having been understudied and often considered as junk and selfish elements, it is currently believed that they have played and continue to play important roles in the biology and evolution of metazoan [2, 3, 4, 5, 6]. One of the first observation of the existence and activity of TE was made in Drosophila melanogaster where specific outcrosses displayed sterility and other germline abnormalities defined together as hybrid dysgenesis. Further observations lead to the discovery that these phenotypes were due to the lack of silencing, in the specific outcrosses, of the P-element (a DNA transposon) and elucidated the molecular mechanisms causing the phenomenon . Later, Mello and Fire discovered that Caenorhabditis elegans (C. elegans) mutants, deficient for RNA interference (RNAi), displayed an increased TE mobilization and proposed that the RNAi system has evolved also as a defence response to protect germline from TE activity . Nowadays, although TE activity in the gonads might represent a driving force in genome evolution, it is accepted that it is mostly inhibited by the PIWI/piRNA pathway . Findings in the last decade highlighted that TE mobilization is not confined to germ cells and cancer tissues. They, indeed result expressed and active during embryogenesis [10, 11, 12, 13, 14, 15, 16, 17] and even in the adult central nervous system (CNS) [5, 18, 19, 20, 21, 22, 23]. TE (mostly LTR) have been proposed to play fundamental roles during embryogenesis, when they shape gene expression acting as regulatory elements, providing promoters and binding sites, regulating chromatin accessibility, and physically interacting with transcripts [10, 11]. Several studies evidenced that TE are needed during mammalian embryogenesis in diverse biological processes such as pluripotency maintenance, embryo viability and immune response priming [12, 13, 14, 15]. According to these studies, the complete lack of expression as well as the over-expression of TE is not compatible with the correct development of the mammalian embryo, thus suggesting that the expression of TE is strictly regulated during mammalian embryogenesis. Finally, TE have been suggested to play a dual role in the CNS of organisms such as fruitfly, mouse and human. On one hand, activity of retrotransposons in CNS determines somatic mosaicism [5, 18, 19, 20, 21, 22, 23] which has been proposed to be correlated with the evolution of cognitive capabilities [19, 20, 22]. On the other, alteration of their expression and activity have been associated to neurodevelopmental and neurodegenerative disorders [24, 25, 26, 27].
C. elegans is a ~ 1 mm long nematode largely used as model organism. Its maintenance under laboratory conditions is very simple: the transparent nematode is characterized by a short generation time (3–4 days), its food source is Escherichia coli and up to 1000 worms can be cultured at the same time in a 55 mm petri dish . C. elegans embryogenesis lasts for ~ 16 h, the embryonic cell lineage has been the first metazoan to be completely mapped in the early eighties and a name has been assigned to all the embryonic cells . In the early stages five asymmetric divisions produce six founder cells: AB, MS, E, C, D, and P4. In more details a P0 zygote gives rise to a larger anterior cell, AB, and a smaller posterior blastomere, P1 (2-cell stage). P1 undergoes an asymmetric division that gives rise to EMS and P2, while AB through a symmetric division gives raise to ABa and ABp (4-cell stage). Subsequent asymmetric divisions of EMS into MS and E, of P2 into C and P3, and symmetric divisions of ABa and ABp, which generate ABal, ABar, ABpl and ABpr, characterize the 8-cell stage. The further divisions of the 8 cells complete the generation of the founder cells whose descendants will produce specific cell types (16-cell stage) . C. elegans gene manipulation can be carried out in simple and very effective ways [30, 31]. The adult is composed of about 1000 somatic cells, 302 of which are neurons. Approximately 15% of its genome derives from TE . Unlike fruitfly, mouse and human genomes in which the majority of TE are retrotransposons, C. elegans TE are mostly DNA transposons (Additional file 1). Globally, 74% of C. elegans TE are annotated as DNA transposons, 16% as RC transposons and 10% as retrotransposons (1% SINE, 4% LINE, 5% LTR). According to literature, the Tc/Mar family (DNA TEs) is the most active, while active retrotransposition was never observed under laboratory conditions [32, 33]. To our knowledge no study has ever been performed on the expression of TE during C. elegans development.
Here we explore TE expression dynamics in the C. elegans early embryo (from the zygote to the 16-cell stage) taking advantage of the single-cell RNA sequencing (scRNA-seq) dataset generated by Tintori et al. in 2016 . We developed a bioinformatics pipeline aimed at the quantification of TE-specific reads and analyzed if, when and where each specific class of TE is expressed during C. elegans development and their potential correlations with the expression of protein coding genes.
Data collection and pre-processing
To study the expression of TE in the C. elegans early embryo we took advantage of Tintori et al. scRNA-seq public data . They sequenced single cells from embryos of the 1-, 2-, 4-, 8- and 16-cell stage. Totally, they sequenced 219 cells, generating between 5 and 9 replicates for each of the 31 different cell types. We downloaded raw files containing single-end reads of 50 bp from ENA-EBI database (PRJNA312176 accession code) and discarded 55 samples that did not pass quality filters regarding whole embryo mRNA mass, according to the authors. The filtered dataset is globally composed of 164 samples, each cell type is represented by a minimum of 4 to a maximum of 7 replicates. We report the selected samples in Additional file 2.
TE expression analysis
TE/gene expression correlation and pathways analysis
We performed a correlation analysis between the expression of the analyzed TE and all the C. elegans genes. We retrieved gene expression values (RPKM) from the Supplementary Table S2 of the paper published by Tintori et al. . To select TE and genes with a reproducible expression among the replicates of the same cell type we selected TE and mRNAs with an expression value > = 25 RPM or RPKM in at least 3 replicates of at least 1 cell type. We performed a pairwise correlation analysis between TE and coding genes using Pearson correlation test. Pearson correlation coefficients were calculated using pearsonr function of the scipy Python module (stats sub-module) selecting only correlations with R2 > = 0.4 or < = − 0.4 and with an FDR (Benjamini & Hochberg) corrected p-value <= 0.0001. To analyze potential pathway enrichment for genes involved in the selected correlations, a statistical over-representation test was performed using Panther tool  (version: 13.1, reference list: Caenorhabditis elegans, Annotation Data Set: Reactome pathways, Test type: Fisher’s Exact with FDR multiple test correction, FDR corrected p-value cut-off < 0.01). All the plots were generated using R Software.
Results and discussion
A bioinformatics pipeline to specifically measure TE expression levels
Taking advantage of the scRNA-seq dataset published by Tintori et al , we quantified TE expression in all the sampled cells. This dataset is composed of 164 samples subdivided among the 31 different cell types characterizing 5 early embryo cell stages (1-, 2-, 4-, 8- and 16-cell stages). To consider only reads effectively mapping on TE, our pipeline specifically exclude reads mapping on TE fragments embedded in annotated coding and/or long non-coding transcripts. Reads are firstly mapped, allowing multimapping, against a reference transcriptome made of all the annotated transcripts plus the entire species-specific TE consensus sequences from RepBase. Next, for each read, all the alignments with the best score (multimapping reads may have more than one alignment with best score) are selected. Finally, reads aligning with best score exclusively against TE are used for TE expression quantification. This means that a read mapping with the best score against both a TE and a coding/non-coding transcript is discarded (Fig. 1a, b). This strategy avoids the usage of those reads that might derive from TE fragments embedded in annotated transcripts in the measurement of TE expression. In this work, we will call TE-non-specific the reads mapping with best alignment score on both a TE and a coding/non-coding transcript and TE-specific those reads mapping with the best score exclusively on a TE. On average, about 80% of reads were mapped against the whole reference transcriptome (the union of coding, non-coding and TE transcripts). TE expression resulted low but detectable with a median number of TE-specific reads of 0.1% across all the samples. Interestingly, about 20% of reads mapping with at least one best alignment on TE belongs to the TE-non-specific reads. For these reads it is not possible to determine whether they originated from a coding/non-coding transcript or a TE and therefore, keeping them into account, might cause biased expression level calculations. We carried out the same expression analysis using SalmonTE , a recently published tool for TE expression. The results obtained with SalmonTE globally confirmed the general trends observed with our pipeline (Fig. 1c). However, especially in the AB descendant cells of the 16-cell stage, SalmonTE indicated generally higher TE expression levels with respect to our pipeline. To better understand the origin of the difference between the two sets of results, we selected all the TE-non-specific reads and quantified their level for each sample. The results (Fig. 1d) showed that TE-non-specific reads are more abundant in AB-descendant cells (16-cell stage), which correspond to the samples with the highest difference between SalmonTE and our pipeline. These results suggest that the differences observed between the two pipelines are mainly due to the different usage of TE-non-specific reads and that SalmonTE might be using, to measure TE expression levels, also reads which could be deriving from coding/non-coding transcripts. Intriguingly, AB cells of the 16-cell stage give also rise to neurons [34, 43, 44], which are known to be characterized by the expression of a high number of long non-coding RNAs (lncRNAs) which in turn are enriched for TE fragments [45, 46, 47]. We therefore believe that the usage of TE-non-specific reads in the quantification of TE expression might lead to an overestimation of TE expression, especially in nervous tissues, caused by the expression of annotated transcripts with embedded TE fragments. Filtering out TE-non-specific reads would lead to a more precise quantification of TE expression.
TE expression changes among the stages of the C. elegans early embryo
The TE global expression profiles in each of the 31 cell type and stage (raw read counts in Additional file 3) is summarized in Fig. 2a and in Additional file 4. It shows that TE abundance is particularly high in the transcriptionally inactive embryo cells (1-cell P0 zygote, 2-cell AB and P1 cells) , in the 4-cell stage and in the 8-cell stage. This suggests that TE mRNAs are a component of the maternal mRNAs and are important in the initial developmental stages. A principal component analysis performed on the expression levels of all the C. elegans TE belonging to DNA, LTR, LINE and SINE classes (Fig. 2b) shows that the 164 samples could be subdivided in two main groups. The first group mainly collects samples from the initial stages (1-, 2-, 4- and 8-cell stages), while the second group is principally composed by samples from the 16-cell stage. LTR expression determines the grouping of 1-, 2-, 4-, 8-cell stages, while non-LTR retrotransposons (SINE and LINE) expression determines the separation of 16-cell stage from the other cell stages, indicating that these two groups of elements have rather opposite expression dynamics. These results support the observation that LTR and non-LTR retrotransposon expression might be differentially regulated in the C. elegans early embryo.
LTR expression is higher during stages associated to pluripotency maintenance and might activate the embryo innate immune response
LINE elements are mainly expressed in E lineage cells
As shown in the Fig. 3b, LINE are the less expressed class of TE in the C. elegans early embryo. Overall, according to our analysis, LINE are expressed in few cell types, mainly belonging to the 16-cell stage. In particular, our results suggest that LINE are expressed in E and E precursor cells (4-cell stage EMS cell, 8-cell stage E cell and 16-cell stage Ea and Ep cells) and, at lower levels, in several AB cells of the 16-cell stage. Intriguingly, the E lineage gives rise to the intestine [34, 53], while AB lineage gives rise to neurons and non-neuronal tissues characterized by high concentration of nervous connections such as pharynx and epidermis [34, 43, 44, 54, 55]. LINE expression in intestine precursor cells was quite unexpected, whereas the expression of LINE in neurons and nervous system associated tissues has already been observed for higher organism [5, 18, 19, 20, 21, 22, 23] and will be discussed in the next paragraph. Our analyses evidenced that there is not a single element capable to recapitulate the LINE global expression pattern as resulted for LTR. The general expression profile observed is the sum of different elements showing variable and element-specific expression dynamics. LINE2A and LINE2C1 are mostly expressed in the 4-cell stage EMS cell and in MS cells (16-cell stage), LINE2B is expressed in the 8-cell stage E cell and in the 16-cell stage AB and MSx1 cells while LINE2F, that have an expression of ~ 5-fold with respect to LINE 2A, 2C1 and 2B, seems to be exclusively expressed in Ea and Ep cells of the 16-cell stage (Additional files 4 and 6). This may suggest that different LINE elements might play different roles during C. elegans embryogenesis.
SINE are mainly expressed in AB lineage cells
Figure 3c shows SINE element expression. SINE are expressed at higher levels with respect to LINE, but lower than LTR and DNA transposons. SINE class in the C. elegans reference genome is composed of 2 elements (SINE1 and CELE45), with CELE45 being the only one resulting expressed in our analysis (Additional file 4). CELE45 is highly expressed in all the AB cells at the 16-cell stage, suggesting its specific expression in neurons, pharynx and epidermis precursors, as partially shown by LINE. The expression of CELE45 in AB cells at the 16-cell stage seems to be more specific than the expression of LINE and in particular of the LINE2B element. Taken together, our results suggest that both the SINE CELE45 and the LINE LINE2B are expressed in tissues characterized by associations to the nervous system. Expression and activity of non-LTR retrotransposons have already been evidenced in neurons and neuronal precursors in other species. Perrat et al. showed expression and insertional activity of several TE including LINE-like elements in Drosophila melanogaster brains . Moreover, several studies proposed that LINE elements are expressed and actively retrotransposed in neuronal precursors during differentiation of the central nervous system inducing somatic mosaicism and increasing the neuronal plasticity in mouse and human brains [18, 23]. Therefore, we speculate that activation of non-LTR elements in C. elegans nervous cells during development may play a role in neuronal cell fate specification, leading to neuronal cells diversity and possibly affecting neural plasticity and synapsis formation.
DNA TE have a heterogeneous expression profiles
DNA TE (Fig. 3d) are expressed at higher levels with respect to SINE and LINE but lower than LTR. DNA transposons are the most abundant TE in the C. elegans genome and they are the only class previously suggested to be active in the C. elegans genome [32, 33]. Their global expression is relatively constant throughout the analyzed stages and cell types. The most expressed DNA TE are Chapaev1, CEMUDR1, PALTA3, and PALTTTAAA3 (Additional files 4 and 7) and intriguingly these 4 TE have very different profile of expression. Chapaev1 is constantly expressed among the early embryo cell types and its expression recapitulates the overall expression of DNA transposons. CEMUDR1 is expressed in 1-, 2-, 4- and 8-cell stages, its expression profile is similar to the one showed by LTR elements. PALTA3 and PALTTTAAA3 elements are lowly expressed in 1-, 2- and 4-cell stages, their expression increases at 8-cell stage reaching the highest expression in the AB cells of the 16-cell stage. This expression profile is very similar to the one showed by LINE2B and CELE45. These results suggest that DNA transposons have a heterogeneous expression profile that can be divided in the following types: i) constant, ii) LTR like and iii) non-LTR like. DNA transposons are therefore the only TE class constantly expressed in all the cell types of the C. elegans early embryo.
Expression of LTR elements correlates with the expression of genes associated to the innate immune response
Number of positive and negative correlations for the 11 selected TEs
Several studies have recently reported the expression of TE in mammalian embryos and the CNS suggesting their role in fundamental biological processes such as pluripotency maintenance, embryo viability and differentiation, brain functioning, evolution and diversification [2, 12, 13, 14, 18, 19, 20, 21, 22]. In this study we developed a bioinformatics pipeline able to quantify reads specifically mapping on TE and explored TE expression in the C. elegans early embryo, from zygote to 16-cell stage. Our results suggest that, especially in neural tissues, a portion of reads mapping on TE cannot be distinguished by reads deriving from TE fragments embedded in annotated transcripts. These non-specific reads should therefore be discarded to avoid biases in the estimation of TE expression. In addition, our data show that TE are expressed in the C. elegans embryo and that, despite their low level of expression, they present different expression profiles in different embryonic stages and cell types, suggesting a specific regulation during early development. We observed a clear split of developmental TE expression levels in two phases characterized by the expression of two different families of TE, LTR and non-LTR. LTR elements resulted to be mostly expressed in the initial stages (1-, 2-, 4-, 8-cell stages). In particular, according to timing and territories of expression we propose that LTR expression (mainly LTRCER1 and CER1 elements) in the initial developmental stages might play a role in the maintenance of pluripotency and/or the innate immune response activation. We also observed that LINE are mostly expressed in intestine precursor cells (E lineage) and, together with CELE45 (SINE), in 16-cell stage AB cells, the ones giving rise to neurons and tissues connected with nervous system. These results are consistent with the observations reporting the expression of non-LTR elements in nervous tissues of other organisms like fruitfly, mouse and human [5, 18, 19, 20, 21, 22, 23]. DNA transposons are the most abundant TE fixed in the C. elegans genome and, according to our results, the only TE class expressed in all the cell types of the C. elegans early embryo. Overall, DNA transposons are constantly expressed and are composed by TE with heterogeneous expression profiles that can be summarized in: i) constant (Chapaev1), ii) LTR-like (CEMUDR1) and iii) non-LTR-like (PALTA3 and PALTTTAAA3).
To our knowledge this is the first report analyzing expression of TE in the C. elegans early embryo and no work on the effects of TE silencing during the C. elegans development has ever been performed. In this work we have tried to support our speculations reasoning at a broader evolutionary context, taking into account experiments made in other organisms. Experiments of TE silencing in developmental and/or cellular contexts have been performed mainly in cultures of mammalian pluripotent stem cells and no comprehensive inspections on the effects of TE silencing during whole development of an entire embryo have been reported. In 2004 Park and colleagues  silenced, in the mouse zygote, the MT transposon like element, which belong to the LTR family and is expressed in the oocyte. The silencing resulted in the block of the zygote division thus suggesting a fundamental role played by a transposon during mouse embryogenesis. Lu et al.  silenced the LTR retrotransposon HERVH in hESC and observed a morphological change with cells adopting a fibroblast-like appearance. Furthermore, they also described a significant up-regulation of HERVH during the reprogramming of fibroblasts into induced pluripotent stem cells (iPSCs) supporting the involvement of the HERVH retrotransposon in the maintenance of the pluripotency state in hESCs. Future RNAi experiments of TE in C. elegans embryos might validate whether the expression of TE has any functional role. Departing from our results, a first indicative experiment would consist in silencing the most expressed LTR, LTRCER1 and CER1, followed by measuring the embryo susceptibility to viral and bacterial attacks and its capability to correctly develop and differentiate.
We propose that, despite the low level of expression, TE transcription is finely regulated during the early embryo development of C. elegans and might be involved in specific developmental functions in agreement and reinforcing what has already been observed in more complex organisms.
The authors would like to thank the guest editors and the anonymous reviewers. Furthermore, the authors would like to thank Dr. Greta Busseni (Integrative Marine Ecology - Stazione Zoologica Anton Dohrn, Napoli, Italy) for her support and the authors of the Tintori et al. article and members of the Goldstein lab, without their data made public this study could have not been possible.
About this supplement
This article has been published as part of BMC Bioinformatics, Volume 20 Supplement 9, 2019: Italian Society of Bioinformatics (BITS): Annual Meeting 2018. The full contents of the supplement are available at https://bmcbioinformatics.biomedcentral.com/articles/supplements/volume-20-supplement-9.
RS, FA, SG designed the study; FA performed all analyses and wrote the manuscript; RS supervised the analyses, wrote and reviewed the manuscript; SG, MS and EDS reviewed the manuscript; all the authors approved the final version of the manuscript.
Publication costs are funded by Istituto Italiano di Tecnologia (IIT).
Ethics approval and consent to participate
Consent for publication
The authors declare that they have no competing interests.
- 33.Bessereau J-L. Transposons in C. elegans. WormBook. 2006; Available from: http://www.wormbook.org/chapters/www_transposons/transposons.html. Cited 27 Mar 2018.
- 39.Picard tool - Broad Institute. Available from: http://broadinstitute.github.io/picard/. Cited 10 Mar 2018.
- 43.Altun ZF, Hall DH. Nervous system, general description. WormBook. 2011; Available from: http://www.wormatlas.org/hermaphrodite/nervous/Neuroframeset.html. Cited 28 Mar 2018.
- 44.Hobert O. Neurogenesis in the nematode Caenorhabditis elegans. WormBook. 2010; Available from: http://www.wormbook.org/chapters/www_specnervsys.2/neurogenesis.html. Cited 27 Mar 2018.
- 49.Nance J, Lee J-Y, Goldstein B. Gastrulation in C. elegans. WormBook. 2005; Available from: http://www.wormbook.org/chapters/www_gastrulation/gastrulation.html. Cited 11 Sept 2018.
- 50.Maduro MF. Cell fate specification in the C. elegans embryo. Dev Dyn. 2010:1315–29.Google Scholar
- 53.McGhee J. The C. elegans intestine. WormBook. 2007; Available from: http://www.wormbook.org/chapters/www_intestine/intestine.html. Cited 5 July 2018.
- 55.Altun ZF, Hall DH. Alimentary System, Pharynx. WormBook. 2009; Available from: http://www.wormatlas.org/hermaphrodite/pharynx/Phaframeset.html. Cited 28 Mar 2018.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.