A single factor dominates the behavior of rhythmic genes in mouse organs
Circadian rhythm, regulated by both internal and external environment of the body, is a multi-scale biological oscillator of great complexity. On the molecular level, thousands of genes exhibit rhythmic transcription, which is both organ- and species-specific, but it remains a mystery whether some common factors could potentially explain their rhythmicity in different organs. In this study we address this question by analyzing the transcriptome data in 12 mouse organs to determine such major impacting factors.
We found a strong positive correlation between the transcriptional level and rhythmic amplitude of circadian rhythmic genes in mouse organs. Further, transcriptional level could explain over 70% of the variation in amplitude. In addition, the functionality and tissue specificity were not strong predictors of amplitude, and the expression level of rhythmic genes was linked to the energy consumption associated with transcription.
Expression level is a single major factor impacts the behavior of rhythmic genes in mouse organs. This single determinant implicates the importance of rhythmic expression itself on the design of the transcriptional system. So, rhythmic regulation of highly expressed genes can effectively reduce the energetic cost of transcription, facilitating the long-term adaptive evolution of the entire genetic system.
KeywordsCircadian rhythm Rhythmically expressed gene Rhythmic gene function Energy cost
Circadian rhythm refers to a 24-h self-sustained oscillation of physiological processes, which is evolutionarily conserved [1, 2, 3, 4, 5]. In animals, this oscillation coordinates various physiological activities, including behaviors, such as sleeping and feeding [6, 7, 8]. Surprisingly, this oscillation exists not only in an individual organism as a whole, but is also widely detected in the constituting tissues and single cells [9, 10, 11, 12, 13]. Thousands of genes display rhythmic transcriptional oscillation, as has been determined by either microarray or RNA sequencing technologies [10, 12, 14, 15, 16]. On single-cell level, almost every cell utilizes these oscillations to control or regulate own overall gene expression [9, 17, 18], indicating that circadian regulation plays a fundamental role in the transcriptional system.
The core circadian transcriptional network in mammals consists of several important transcription factors, such as Clock, Bmal1, Per1/Per2, and Cry1/Cry2. Although the negative feedback loop in the circadian regulatory network is conserved in different tissues, the regulated genes in each tissue are distinct from each other. According to an early study involving microarray expression profiling, approximately 8–10% expressed genes are rhythmically regulated in the mouse liver and heart . Importantly, rhythmically expressed genes in the two tissues rarely overlap, indicating that these genes are highly tissue-specific. Tissue specificity of rhythmic genes was subsequently widely confirmed. Zhang et al.  constructed a circadian gene expression atlas by using data for 12 mouse organs, and found that approximately half of the protein-coding genes are expressed rhythmically, with strong organ-specific signals. Similar, in humans, more than 7000 genes show rhythmic expression pattern in at least one of 13 tissues collected; 12% of these genes are drug targets . A systematic study of 64 tissues from the baboon indicated that over 80% of protein-coding genes are rhythmically expressed across the body, with few overlapping . The wide distribution of rhythmically regulated genes indicates their importance to the functional specificity of each tissue.
In addition to tissue specificity, rhythmically expressed genes also exhibit species-specific characteristics. A detailed comparison of 11 tissues in mouse and baboon suggested that only a small proportion of rhythmically expressed genes overlap in each tissue, and no significant correlation was observed between the numbers of rhythmic genes in the two species . Further, only 46 out of 188 rhythmically expressed genes in the epidermis in humans exhibit strong oscillation (i.e., high amplitude and cycling) in the epidermis of mouse . The organ- and species-specific characteristics of rhythmically expressed genes are two fundamental properties of the circadian regulatory network. These properties indicate that the factors that affect the distribution pattern of rhythmic genes may be very complex. However, it is unclear whether a single dominant factor exists.
Here, we aimed to determine whether a single major factor exists, that influences the expression of rhythmic genes. By investigating all rhythmically expressed genes in the circadian gene expression atlas , which to date is the largest circadian atlas for mouse, we have identified expression level as the key factor that dominates rhythmic gene expression. It explained the majority of variations in the circadian amplitude of cyclic genes in 12 mouse organs. We also examined the role of gene function and tissue specificity in rhythmic expression. Finally, we surveyed the energy consumed during the expression of different regions of cyclic genes and explored its effects on rhythmic expression. Overall, the presented data suggest that a unified model can potentially be used to explain rhythmic gene expression in various mouse organs.
Gene expression level explains > 70% of the variation in amplitude
Microarray data were used for the computational analyses since the sampling frequency of each tissue was close to that recommended in the large-scale analysis of rhythmic expression . The obtained results were very similar if we use RNA sequencing data from the mouse circadian atlas for the analysis (Additional file 2: Figure S2). In addition to JTK_Cycle, which was originally used to calculate the oscillation features in the mouse atlas, we recalculated the properties of rhythmic genes by using ARSER. The obtained results were similar to those described above (Additional file 3: Figure S3).
Gene functionality and tissue specificity are not strong predictors of amplitude
Further, we found that tissue specificity was correlated with amplitude, i.e., the amplitude of genes that were rhythmically expressed in multiple organs was greater than that of genes rhythmically expressed in only one organ. The correlation between the two parameters was moderate in the 12 organs analyzed (r = 0.26 on average, P < 0.05 in all organs; Additional file 5: Figure S4A–L). Further, the expression levels of genes that were cyclically expressed in multiple organs were higher in some organs than in others (Additional file 6: Figure S5A–L). By utilizing partial correlation analysis, we also observed that the effect of tissue specificity did not explain the correlation between the transcriptional level and amplitude (0.81 ≤ r ≤ 0.88 after controlling for the effect of cyclic tissue number; Additional file 4: Table S6). These observations indicated that tissue specificity is a positive but not strong predictor of amplitude compared with the transcriptional level. In addition, we have compared the housekeeping genes with other genes and little differences between the amplitude of cycling housekeeping genes and other cycling genes were found, suggesting that housekeeping gene is not a strong predictor of cycling genes (Additional file 4: Table S7).
Energetic cost is linked to expression level and explains the strength of circadian oscillation
Since the transcription of each gene is not cost free, highly expressed genes require greater energy expenditure during transcription than genes expressed at a lower level. Downregulation of the expression of these genes when they are not needed serves to reduce the overall metabolic cost in the cell . We next determined the synthetic cost of the rhythmic transcripts. Briefly, the energetic cost of each mRNA molecule was calculated based on the sequence composition by integrating the energy required for the precursors during mRNA synthesis, the energy required for transcription initiation and termination, and the rate of mRNA degradation. The total energy cost of the transcription of each gene was calculated taking into account the mRNA decay rate and the transcriptional level [25, 26]. As anticipated, we observed a strong positive correlation between the expression level of the transcripts and energy consumption during their transcription (r > 0.75, P < 1 × 10− 50 in all organs; Additional file 7: Figure S6), implying that the rhythmical regulation of transcription of highly expressed genes also determines the energy expenditure.
In mammals, over 50% of the transcriptome is rhythmically regulated in at least one organ. Although previous studies have shown that the expression level of rhythmically expressed genes is higher than that of other genes [25, 27, 28], the extent to which this factor contributed to the rhythmicity of gene expression remained unclear. This question is important considering the possibility of existence of other factors such as different functional pathways that might govern the expression rhythmicity, should the expression level exert only a minor effect on rhythmic gene expression. In the current study, we showed that the expression level of transcripts plays a crucial role in determining whether the rhythmic transcripts are regulated by the circadian regulatory network or not, and that the effect of expression level exceeds that of other potential factors, such as functionality and tissue specificity. We further showed that this single factor can explain > 70% of the variation in the amplitude of rhythmic transcripts. Further, the higher the expression of rhythmic genes, the greater the energy expenditure of the transcription process. Transcriptional systems tend to downregulate the highly expressed genes when their function is not necessary.
Circadian rhythms are closely linked with the cellular metabolism [2, 29]. For instance, the activity of BMAL, one of the core regulators of the circadian regulatory network, is regulated by the transcriptional repressor REV-ERB [30, 31]. The findings of the current study suggest that the output of the circadian regulatory network itself is an energy-saving strategy for the gene expression process. Collectively, these lines of evidence indicate that the regulation of metabolism and metabolic cost are critical for the evolutionary adaptation of the cell.
The lack of preservation of rhythmic properties among diverge tissues and organs, or between divergent species, is different from that of gene function, as the latter is typically highly evolutionarily conserved. This difference is a strong indication that the direct link between the rhythmicity and functionality of the cyclic genes is very weak. Since the regulation of highly expressed genes is a major requirement for circadian gene expression, functional pathways that contain many highly expressed genes are usually over-enriched in cyclic genes, compared with pathways that are expressed relatively weakly. Pathway analysis of rhythmic gene expression should take into account the effects of gene expression levels to obtain an unbiased view of the functional distribution of those genes, as a warning for interpreting many previous cyclic pathway analyses.
The observations of the current study indicate that a specific biological function plays a minor role in determining the rhythmic gene expression. Since selective downregulation of highly expressed genes is a systematic strategy to reduce the energetic cost of transcription, undoubtedly, under some specific circumstances, the function of a particular gene could be directly related to its rhythmicity. For example, PER1, PER2, and PER3 are expressed periodically in at least 8 of 13 human tissues , and Per2 shows robust rhythmic transcription in the mouse liver .
The findings of the current study also indicate that identification of genes whose function is directly related to their rhythmic expression pattern is not a trivial task. Two potential approaches are proposed here for further consideration. One involves controlling for the effect of gene expression, as the expression level is the primary factor determining the rhythmic expression of cyclic genes. If the expression profile of a particular gene is robustly rhythmic regardless of whether the gene is overexpressed or underexpressed in a cell, the function of the gene may be related to its rhythmicity. Another approach is controlling for tissue specificity. We are convinced that genes that are rhythmically expressed in multiple tissues are most likely to be strong candidates for essential cyclic genes. Ultimately, one may find that, contrary to the current widespread observations that the majority of transcribed genes are rhythmically expressed, only a small fraction of these genes are essential cyclic genes.
We here showed that the transcriptional level is the single factor that dominates the behavior of rhythmic genes in mouse organs. In mouse, on the molecular level, the circadian regulatory network mainly regulates highly expressed genes rather than other genes, to reduce the overall energetic cost. Although many key genes influencing the circadian behavior have been identified in the past decades, big gaps still exist to obtain a full explanation of the circadian behavioral phenotypes based on the underlying plethora of molecular activities.
The expression profiles of all transcripts from 12 organs of mouse (Mus musculus) were derived from the mouse circadian gene expression atlas (last accessed on August, 2018), which is currently the largest repository for rhythmic expression data for mouse . The parameters of rhythmic gene expression were calculated by using JTK_ Cycle [32, 33], with adjusted P < 0.05 (Benjamini-Hochberg–corrected) as the cutoff for identifying cycling genes. Amplitude refers to a one-cycle median sign-adjusted deviation from the median expression, and was calculated by using JTK_Cycle. Only genes with assigned expression values across all time points in a particular tissue were considered “expressed” and used in the analysis. Almost half of the expressed transcripts are rhythmically transcribed . Finally, similar parameters were calculated by ARSER to double check the primarily results. The parameters for determining the energetic cost of each mRNA molecule, such as the synthesis energy required, were derived from the determinations for the yeast metabolic system and based on the number of activated phosphate bonds (~P) . The genome-wide mRNA degradation rates were determined by metabolic pulse labeling, as previously reported . The analyses were made under the assumption that the degradation rate is primarily determined by the mRNA sequence and relatively consistent at different rhythmic time points.
To determine whether functional classification exerts a dominant influence on the amplitude of rhythmic genes, enrichment analysis of cycling genes in each organ was performed by utilizing clusterprofiler  and the database for annotation, visualization, and integrated discovery (DAVID) (https://david.ncifcrf.gov/) [36, 37] and the Reactome pathway website (https://reactome.org/) [38, 39]. GO analysis (enrichment for “Biological process”, “Cellular component”, and “Molecular function”) and KEGG pathway analysis were performed by using the former; Reactome biological pathways were analyzed by using the latter. Background gene list was set containing all the expressed genes in each tissue. Both fold-enrichment value and significance value (p) from the analysis were used as indicators for the strength of cycling gene enrichment in a specific functional category. Log (p) values were used to facilitate downstream analysis. Finally, for each enriched pathway, the average amplitude of rhythmic genes was calculated and correlated with the enrichment strength by using R codes. For instance, for the “Biological process” analysis in liver, all the 5822 terms were considered as the background functional term list, and all the 2632 cycling genes detected in liver were used to search for the enriched pathways, with multiple testing correction. The results show that 499 pathways were enriched (P < 0.05, BH-correction). Other enrichment analyses were performed similarly.
Calculation of the energetic cost of mRNA
The energetic cost of mRNA was determined by the amount of activated phosphate bonds (~P) as described previously [25, 26]. The synthesis cost for each mRNA molecule is mainly determined by the energy usage of synthesizing each nucleotide and the nucleotide composition of mRNA. Hence, both the synthesis cost of single mRNA molecule and its copy number were considered in each calculation. To distinguish the cost effects of different transcriptional regions on the amplitude, the energetic costs of 3′ UTR, 5′ UTR, and coding regions were calculated separately. Overall, 30,720,384 transcripts were analyzed. The cost for each mouse gene is listed in Additional file 9: Table S10.
Linear regression analysis
Linear regression analysis was used to quantifying the relationship between the transcription level and amplitude of rhythmic genes. Averaged expression from different sampling time points was calculated for each organ. Logarithm of the averaged expression was then correlated with the logarithm of amplitude value of each rhythmic gene. To describe the extent to which the changes in expression affected the changes in amplitude, the coefficient of determination in the linear regression was calculated. As in a typical common interpretation of linear regression analysis, R2 was used to indicate the contribution of the transcription level to the variance of amplitude, namely, the explained variation of the amplitude of rhythmic genes.
To examine whether the proportion of cycling genes increases with the increasing transcription, the average expression levels in each organ were used. All the expressed genes in each organ were divided into five groups according to expression level (top 20%, 20–40%, 40–60%, 60–80%, and bottom 20%). The proportion of cycling genes in each category was calculated as the number of cycling genes in that category divided by the total number of cycling genes. Following this strategy, “top 50% highly expressed genes” was defined as the top half of all genes with assigned expression values at all sampling time points in each tissue. This gene set was then used to determine the existence of correlation between the expression level and amplitude of highly expressed genes.
Principal component analysis
Principal component analysis was used to evaluate the overall contribution of energetic cost to the amplitude of rhythmic genes. That was because although the energetic cost of 5′ UTR, 3′ UTR, and the coding region of rhythmic transcription strongly correlated with amplitude, these three variables were also significantly interrelated. The dimensionality of these factors was reduced by using the principal component analysis . The analysis was performed using the formula: amplitude ~ cost of 5′ UTR + cost of 3′ UTR + cost of coding region. Overall, all genes containing 5′ UTR and 3′ UTR regions (19,622 genes) were included in the analysis. In the permutation experiments, the amplitude value of rhythmic genes was shuffled 1000 times; for each time point, the principal component analysis was performed, and the explained effect of energetic cost on amplitude was determined. Energetic cost and amplitude values were log-transformed for all the above analyses.
Partial correlation test
Analyses indicated that for rhythmic genes, the amplitude slightly increases as the cyclic tissue number increases. Cyclic tissue number was defined as the number of tissues in which a particular transcript exhibits rhythmic expression. To investigate whether the correlation between transcription level and amplitude existed after controlling for this effect, a partial correlation test was used. Partial correlation coefficients were calculated for each organ .
The analysis and processing of all the data were performed by using R software. The “stats” package was used for the linear regression analysis and principal component analysis, and “ggm” package was used for the partial correlation test.
We thank Hogenesch group for kindly sharing the circadian gene expression data and Lijun Lian for the critical reading of the manuscript.
GZW conceived the project; YC and YHC performed the analyses; GZW, YC, and LZ discussed the results; GZW and YC wrote the manuscript. All authors read and approved the manuscript.
This work was supported by the National Key R&D Program of China (2016YFC0901700 and 2016YFC1303100) and the National Natural Science Foundation of China (nos. 81827901, 31600960, and 31871333). The funding bodies had no role in the design of the study; the collection, analysis, and interpretation of data; and in writing of the manuscript.
Ethics approval and consent to participate
Consent for publication
The authors declare that they have no competing interests.
- 8.Mendoza J. Food intake and addictive-like eating behaviors: time to think about the circadian clock(s). Neurosci Biobehav Rev. 2019;106:122–32.Google Scholar
- 16.Ruben M, Wu G, Smith D, Schmidt R, Francey L, Lee Y, et al. A database of tissue-specific rhythmically expressed human genes has potential applications in circadian medicine. Sci Transl Med. 2018;10(458):eaat8806.Google Scholar
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.