Expression dynamics and relations with nearby genes of rat transposable elements across 11 organs, 4 developmental stages and both sexes
TEs pervade mammalian genomes. However, compared with mice, fewer studies have focused on the TE expression patterns in rat, particularly the comparisons across different organs, developmental stages and sexes. In addition, TEs can influence the expression of nearby genes. The temporal and spatial influences of TEs remain unclear yet.
To evaluate the TEs transcription patterns, we profiled their transcript levels in 11 organs for both sexes across four developmental stages of rat. The results show that most short interspersed elements (SINEs) are commonly expressed in all conditions, which are also the major TE types with commonly expression patterns. In contrast, long terminal repeats (LTRs) are more likely to exhibit specific expression patterns. The expression tendency of TEs and genes are similar in most cases. For example, few specific genes and TEs are in the liver, muscle and heart. However, TEs perform superior over genes on classing organ, which imply their higher organ specificity than genes. By associating the TEs with the closest genes in genome, we find their expression levels are correlated, independent of their distance in some cases.
TEs sex-dependently associate with nearest genes. A gene would be associated with more than one TE. Our works can help to functionally annotate the genome and further understand the role of TEs in gene regulation.
KeywordsTransposable elements Rats Expression patterns Correlation Organ Sex Age
Adverse drug reactions
Differentially expressed genes
Differentially expressed TEs
Downstream of TSSs
Long interspersed elements
Long terminal repeats
Pearson correlation coefficient
Principal variance component analysis
Repeat analysis pipeline
Reads Per Kilobase of exon model per Million mapped reads
Short interspersed elements
Transcription factor binding sites
Upstream of TSSs
Rats and mice have been the most widely used models in biomedical research and drug development for many years [1, 2, 3]. However, a shift has taken place that mice rapidly overtake rats as the major model of choice . As a result, the proportion of publications using mice models has increased from about 20% in the 1970s and 1980s to over 50% in the recent neuroscience-related researches. This shift might result from genome knockout technique, which was first used in mice, rather than in rats . However, the rat is the preferred animal model for physiology, toxicology, nutrition, behavior and neoplasia studies. In addition, the rat can reduce the spread of drugs following intracranial injections . These lead to urgent demands to study gene regulation patterns in rat. Benefiting from the creation and evolution of Rat Genome Database (RGD)  and the completion of the rat genome sequence in 2004 , we could look deep into the genetic rat models.
Transposable elements (TEs) were first discovered in maize and described as “controlling elements” of nearby genes . At present, TEs have been found to exist in almost all species, with the proportions varying from ~1% in Fusarium graminearum to ~85% in maize genome [10, 11, 12]. It could be categorized into retrotransposons and DNA transposons. The former could be amplified through a copy-and-paste mechanism with an intermediate of the element-encoded RNA, while the latter utilizes a cut-and-paste mechanism to self-propagate with the intermediate of DNA [13, 14]. Retrotransposons could be further subdivided into long terminal repeats (LTRs), long interspersed elements (LINEs) and short interspersed elements (SINEs). L1 elements are the main retrotransposons in mammalian genomes with important roles in mutagenesis  and early cancer diagnosis [16, 17]. The new active TE integrations are usually removed from the population by purifying selection, while the high levels of methylation would buffer this effect and allow further adaptation and functionalization [18, 19]. TEs could function as transcription factor binding sites (TFBSs), enhancers, alternative promoters, cryptic splice sites and polyadenylation signals, insulators or modulate RNA abundance and shape RNA-protein regulatory network [20, 21, 22, 23, 24]. Particularly, as the enhancers, TEs could lead to a new group of genes to be expressed together and accelerate the formation of complex new pathways and functions .
Previous researches have suggested that some TE subfamilies may be transcriptionally activated following different tissues or environmental stress. For example, a subset of maize TE families can be activated in response to abiotic stress, including cold, heat, high salt or UV stress . The expression of TEs in Drosophila melanogaster shows stage-specificity across 27 different developmental stages , especially TART-B, copia element and Tom1. In addition, it was also documented that several individual TEs could influence the expression of nearby genes [28, 29, 30]. Faulkner et al.  firstly demonstrated that TEs are the integral part of the transcriptome and their transcripts are generally tissue specific and could influence the transcriptional output of the human and mouse genome. The rice DNA transposon mPing resulted in up-regulation of nearby gene in response to cold or salt stress . Lynch et al. demonstrated that the ancient TEs could donate cis-regulatory elements to recruited genes, especially for human decidual stromal cells, in which 194 ancient TEs were enriched within cis-regulatory elements . Many reports illustrated that some TEs are tissue-specific and could influence the expression of nearby genes, however its influence range and time course remains unclear.
In this paper, we focused on the expression patterns of TEs and their relations with the closest genes in different organs, sexes and ages of rat by using the RNA-seq data . The traditional methods considering only uniquely mapped reads would lead to underestimate expression signal of TEs, because TEs usually have high copy numbers. In this study, we adopted the iteres tool to estimate the expression levels of TE subfamilies for its ability of dealing with non-unique mapped reads .
The work of this study could be divided into two main themes. In the first section, the TEs spread throughout the whole genome and this distribution raises some interesting questions—whether the TE subfamilies expression are organ-, age- and sex-dependent. If so, what’s the pattern? We examined the expression profiles of TEs and found the fraction of differentially expressed TEs (DETEs) varied greatly among organs, developmental stages and sexes. Most SINEs, which were commonly expressed in all conditions, were the major TE types with commonly expression patterns. In contrast, LTRs were more likely to appear specifically expression patterns. In the second theme, the Pearson correlation coefficient (PCC) of expression signals between individual TE and its nearest gene was estimated. In some cases, the PCC was independent on the distance between TEs and the nearest genes. Some LTRs sex-dependently associated with their nearest genes.
TEs distribution in various genomic compartments
To examine whether TEs have tendency to spread in a specific genomic compartment, we estimated TEs distributions in CDS exon, UTR exon, Intron and Intergenic regions. For this, we used the intersect tool from the BEDTools package v2.26.0  and required a minimal overlap fraction of 50%. When a TE was located in multi genomic compartments, it was then assigned to the compartments according to the following priority: CDS exons > UTR exons > Introns > Intergenic regions . For example, if a TE region was overlapped with both UTR exon and Intron, it would be assigned to UTR exons.
Data sources and data processing
RNA-seq data sets for the rat were obtained from . A total of 320 samples consist of 11 organs: Adrenal gland (Ad), Brain (Br), Heart (He), Kidney (Ki), Liver (Li), Lung (Lu), Muscle (Mu), Spleen (Sp), Thymus (Th), Testis (Te) and Uterus (Ut). Each organ was studied in four developmental stages: 2-week-old, 6-week-old, 21-week-old and 104-week-old. Except for Te and Ut, both sexes were studied for each organ. There were four biological replicates in specific organ, age and sex. According to the above description, there are 9 (organ) × 4 (age) × 2 (sex) × 4 (biological replicate) +2 (organ) × 4 (age) × 1 (sex) × 4 (biological replicate) = 320 samples.
In this study, TE subfamilies were considered to be expressed with the averaged RPKM ≥ 1. A TE subfamily was defined as “commonly expressed TEs” if it was expressed in all organs, developmental stages and sexes. Circos  was used to draw the graph of the number of DETEs among organs and links between organs and classes. The clustering of TE subfamilies was performed using Average linkage in MATLAB. Principal variance component analysis (PVCA) leverages the strengths of principal components analysis and variance components analysis to quantify the corresponding proportion of variation of each effect . In this study, it was used to quantify the relative contributions of effects (organ, age, sex and replicate) to total model variance based on the expression matrix of TE subfamilies in different samples.
Identification of differentially expressed and organ-enriched TE subfamilies
In order to compare with expression of genes, we adopted same methods with Yu et al.  to identify enriched TEs. For the sake of completeness, we would describe these methods in brief.
A TE subfamily was defined as the DETE between two organs if t-test with a Bonferroni-corrected P-value was ≤ 0.05 and fold change (FC) was ≥ 2 (overexpressed) or ≤ 0.5 (underexpressed). The intersection of DETEs that were overexpressed in any other 10 organs were defined as organ-enriched TE subfamilies. The development-dependent DETEs were evaluated by comparing different developmental stages for each organ. The condition was FC ≥2 or ≤0.5 plus Bonferroni-corrected P-value ≤ 0.05. Except for testis and uterus samples, other 288 samples were separated into 36 groups according to the organ types and developmental stages. FC and t-test were also performed between male and female to identify sex-dependent DETEs in each group.
In each organ, the FC was calculated between two adjacent developmental stages, with the older developmental stage as numerator, in other words, 104- versus 21-weeks old, 21- versus 6-weeks old and 6- versus 2-weeks old.
A TE subfamily with FC ≥2 or ≤0.5 plus Bonferroni-corrected P-value ≤ 0.05 was divided into the “up” pattern or “decrease” pattern, respectively. The other TE subfamilies were divided into “maintain”. Therefore, a TE subfamily could be divided into 1 out of 27 patterns in each organ, ranging from up-up-up (UUU), maintain-maintain-maintain (MMM), to decrease-decrease-decrease (DDD).
The single-end RNA-seq data of 320 samples from Yu et al.  were employed to quantify the expression levels of the TE subfamily as well as the individual TE. In addition, we associated TEs with genes by using distance between TEs and transcriptional start sites (TSSs). A flowchart of the whole work was shown in Fig. 1.
Proportion of TEs in various genomic compartments
Quantify and evaluate expression signal of TE subfamilies
The Mammalian TEs are hierarchically divided into classes, families and subfamilies. The following analysis mainly focused on four major classes LINE, SINE, LTR and DNA consisting of 56 families and 855 subfamilies in rat. We used BWA V0.7.12  and iteres  to map reads and quantify the expression levels of TE subfamilies (Fig. 1a). The final expression matrix, consisting of 855 subfamilies across 320 samples, was got for the further analysis. After normalization, the pair-wise PCC was calculated between the TE subfamilies expression levels for any two of the four biological replicates (Additional file 2). High reproducibility was detected for each sample group, with PCC values from 0.9228 to 0.9847 and the standard error from 0.0013 to 0.0412.
The distribution of commonly expressed TEs in four classes
We then investigated the effects of organ, age and sex on the TEs expression by using the PVCA (Fig. 2e). We found ~95% variance resulted from organs, while other effects had limited variance even less than the residual variance of the model. Here, it should be noted that the Y chromosome has not been sequenced for rat. This would lead to underestimate the effect from the sex. We therefore look deep into the organ related TE subfamily expression patterns. The results suggested similar TE expression patterns with genes. For example, for both TEs and genes, the highest expressed number was found in the lung and testis, while the lowest number in the liver, muscle and heart (Additional file 3). On average, in the subfamily level there were 184 (21.52%) TE subfamilies expressed in each organ. In the class level the proportion of the expressed TE classes were similar across 11 organs (Fig. 2c). A significant decrease was observed for the expressed LTR proportion against the background. The number of expressed LTRs and specifically expressed LTR subfamilies were the highest in the testis of all organs (Additional file 1: Table S9).
We then performed a hierarchical cluster analysis to obtain an overview of TE expression patterns across 320 samples. The clustering of TE expression profiles suggests that organ has a substantial effect on the transcriptome except for the testis (Fig. 2d). Compared with the clustering of gene expression profiles , the TE clustering performed better in organ discrimination, since one of the four developmental stages in thymus was classified as spleen by the former.
The identification of DETEs
The comparisons were performed between any two developmental stages in each organ to evaluate development-dependent DETEs. We identified 84 DETEs that appeared at least one of the 11 organs. The number of DETEs varied significantly among organs/developmental stages (Additional file 1: Table S5). When compared with the 2-week-old rats, a number of DETEs were detected in other developmental stages, which was similar with the reports by Yu et al. . Among all organs, testis contained the most development-dependent DETEs. The inner comparison within young (6- and 21-week-old) and atrophying (2- and 104-week-old) testes showed a handful of DETEs. We also performed a time course analysis by comparing any two adjacent developmental stages to evaluate transcriptomic activities alterations through the life cycle of the rat (Methods). Each TE could be grouped into one of the 27 possible patterns. The number of subfamilies for each pattern in each organ was shown in Fig. 3b. MMM was the most frequently observed expression pattern, which indicated the stable expression level over the lifespan. In addition, DMM, UMM, MMD and MMU were also frequently observed.
We finally identified sex-dependent DETEs in each organ or developmental stage. Some TEs were differentially expressed between female and male rats, especially in kidney and liver (Additional file 4). Half of the sex-dependent DETEs were observed in 6-week-old rats and ~34.6% in 21-week-old rats (Fig. 3c; Additional file 1: Tables S6 and S11). This may result from adolescence and sexual maturity. Because of organs atrophy in aging rats or non-development in juvenile, there were only four DETEs in 104- and 2-week-old rats. 84.6% sex-dependent DETEs belonged to LTRs, and the others belonged to DNA. In other words, LTRs had sex-dependent expression, so we put forward a hypothesis that LTRs had sex-dependent association with nearby genes.
As aforementioned, organ-dependence, development-dependence and sex-dependence of DETEs exhibited consistent patterns with those of differentially expressed genes (DEGs). We therefore made a further investigation into the associations between genes and TEs.
The relations between genes and TEs
More detailed, we evaluated the expression levels of single TEs by using BWA V0.7.12  and cufflinks v2.2.1 , and then got the nearest gene of each TE by using the closest tool from the BEDTools package v2.26.0 . Since the TEs may exert impacts on the expression of the proximal genes [61, 62], we calculated the PCC of the expression levels between the TE and its nearest gene, and then investigated whether the PCC would be related with their distance.
The same process was performed on the sex-dependent DEG-TE pairs. Some sex-dependent DEGs were detected in multiple organs and ages. Interestingly, more significant correlation peaks could be detected for those sex-dependent DEGs shared by more organs/ages (Additional file 6). We focused on sex-dependent DEG-TE pairs in which the genes appeared in one or five times. The results showed that the latter appeared higher median and frequency (Additional file 7). It indicated that some TEs sex-dependently associated with proximal genes, and more than half of TEs were LTRs.
Development-dependent gene-TE pairs were similarly analyzed and similar results were obtained (Additional files 8 and 9). For the genes with UUU expression pattern, significant correlations were observed with their closest TEs.
We focused on both the expression of TEs and spatiotemporal influence of TEs on genes. We adopted iteres  to calculate expression levels of TEs in different samples due to the high copy number of TEs. Most commonly expressed TEs were identified as the SINEs and vice versa. It has been reported that Alu family elements, belong to SINEs, would be enriched nearby the housekeeping genes [64, 65], and the distribution of SINEs conserved across species . Most specifically expressed TEs were LTRs. The phenomena might result from TE-derived TFs binding sites of SINEs for nearby genes and tissue-specific regulation of LTRs [21, 49, 51, 55]. Except for the significantly alternative expression level of LTR in testes, TE classes showed even expression levels among organs. The results of hierarchical cluster analysis showed that compared with genes, the expression of TEs could better represent differences between organs. We used PVCA to quantify the sources of variance about TEs expression in different organs, ages, sexes and biological replicates. The results suggested that the differences mainly resulted from organs. The reason for the low variance by sex might be that Y chromosome were not sequenced in this dataset. This was consistent with the following differential expression analysis. For example, compared with the amount of organ-dependent DETEs, only 18 subfamilies showed sex-dependent expression dominated by the LTRs. A hypothesis was then made that TEs may sex-dependent associate with nearby genes based on above findings. Our result support this hypothesis. DEGs and DETEs showed similar expression patterns, such as underexpressed in the liver, muscle and heart, and overexpressed in the testes and brain. Most sex-dependent DEGs and DETEs were found at 6 or 21 weeks. Few genes and TEs continuously changed (UUU, DDD) through the lifespan.
TEs are usually considered as the deleterious or the neutral element of genomes, but this effect can be buffered to allow further adaptation and functionalization [18, 26, 67]. The last results may lead to interplay between TEs and genes, which may have important functional contributions to tissues, ages or sexes. We linked individual TE to the nearest gene, then calculated the linear correlation of expression signals and the distance of gene-TE pairs. The results indicated that most TEs positive correlated with their nearby genes at the expression levels. In some cases, PCC didn’t depend on the distance between gene and its nearest gene.
This study presented a comprehensive analysis on the TEs expression patterns from organ types, time course and sexes aspects. The results of our present study suggested that most SINEs, which were commonly expressed in all conditions, were the major TE types with commonly expression patterns. In contrast, LTRs were more likely to exhibit specific expression patterns. Most specifically expressed TEs were also LTRs. Similar expression patterns were shown between DEGs and DETEs. Furthermore, the temporal and spatial influences of TEs on genes were evaluated. The results indicated positive PCCs between most TEs and their nearby genes at the expression levels. In some cases, PCC didn’t depend on the distance between gene and its nearest gene.
In this paper, we used a pipeline to calculate expression level of TE subfamilies in different organs, ages and sexes. The pipeline could also be used in other conditions, such as biotic stress and environmental change. Our works could promote the understanding of the regulation model of rat.
We thank Prof. Leming Shi for providing the original RNA-seq data.
This work was financially supported by the National Natural Science Foundation of China [21,375,090, 21,675,114]. The funding bodies did not have a role in the design of the study, data collection, analysis, interpretation of data, writing the manuscript, nor the decision to publish.
Availability of data and materials
The raw data was obtained in the NCBI database under the accession number SRP037986. The expression value of genes were got from the URL http://pgx.fudan.edu.cn/ratbodymap/index.html. All data generated or analyzed in this study are included in this published article and its supplementary information.
YD, ZL, YY, YL and ML made substantial contributions to conception and design. ZW, YL and YY participated in the acquisition of data. YD, ZH and QK analyzed and interpreted the data. YD and ZH wrote the manuscript. All authors read and approved the final manuscript.
No human or animal material has been directly used, as the study has used publicly available datasets.
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
- 21.Fort A, Hashimoto K, Yamada D, Salimullah M, Keya CA, Saxena A, Bonetti A, Voineagu I, Bertin N, Kratz A, et al. Deep transcriptome profiling of mammalian stem cells supports a regulatory role for retrotransposons in pluripotency maintenance. Nat Genet. 2014;46(6):558–66.CrossRefPubMedGoogle Scholar
- 33.Lynch VJ, Nnamani MC, Kapusta A, Brayer K, Plaza SL, Mazur EC, Emera D, Sheikh SZ, Gruetzner F, Bauersachs S, et al. Ancient transposable elements transformed the uterine regulatory landscape and Transcriptome during the evolution of mammalian pregnancy. Cell Rep. 2015;10(4):551–61.CrossRefPubMedPubMedCentralGoogle Scholar
- 42.Li J, Bushel PR, Chu TM, Wolfinger RD. Principal variance components analysis: estimating batch effects in microarray gene expression data. Batch Effects and Noise in Microarray Experiments: Sources and Solutions. 2009; doi: 10.1002/9780470685983.ch12.
- 63.Veselovska L, Smallwood SA, Saadeh H, Stewart KR, Krueger F, Maupetit-Mehouas S, Arnaud P, Tomizawa S, Andrews S, Kelsey G. Deep sequencing and de novo assembly of the mouse oocyte transcriptome define the contribution of transcription to the DNA methylation landscape. Genome Biol. 2015;16:209.CrossRefPubMedPubMedCentralGoogle Scholar
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.