Introduction

Normal cell functioning requires appropriate gene expression, which depends on multiple regulatory layers (see [1] for review). In this context, transcriptional regulatory modules (TRMs) were extensively studied (for instance [2,3,4,5]). By definition, a TRM is a set of genes for which transcriptional activity is modulated by a specific transcription factor (TF) [6]. In the model yeast Saccharomyces cerevisiae, TRMs are well described [2,3,4,5] and public databases like YEASTRACT [7] or SGD [8], provide lists of target genes for any TF. All together TRMs were explored to better understand their individual organizations, but also their collective relationships [4, 5, 9, 10]. In most studies, questions were addressed via a representation of TRMs as networks. In these networks, TF and target genes are the nodes, which are connected by directed edges (from TF to related targets). Topological properties of such networks were analysed to reveal the design principles underlying transcriptional regulations. It allowed the discovery of important regulatory motifs, surprisingly consistent across very different species [10, 11].

In addition to this information, spatial organization of the 16 chromosomes of S. cerevisiae was reported in the literature [1]. Experimental techniques derived from chromosome conformation capture (3C) were used to obtain a tridimensional (3D) model [12]. This model is based on the idea that interphase chromosomes are not positioned randomly within the nucleus. In particular, chromosomes should adopt a “Rabl configuration”, in which centromeres are clustered together at one pole of the nucleus, whereas arms are extended in several directions until telomeres, which are abutted to the nuclear envelope. Moreover, chromosome 12, which carries the rDNA repeats in S. cerevisiae, is expected to extend outward to join the nucleolus, i.e. the site of ribosome biogenesis (Additional file 1). This 3D model is relevant with the existence of a repressive chromatin structure, i.e. silent chromatin, which is known in yeasts for a long time (see [13] for a review) and affects mating-type loci, telomeres or rDNA repeats. More recently, this 3D model was used to study potential connections between interchromosomal DNA contacts and gene co-expressions [14]. Significant correlations were found, thus supporting the idea that a non-random nature of the genome organization helps to coordinate transcriptional processes in groups of genes, like those found in TRMs.

In this work, our aim was to search for additional insights into the organization of TRMs based on the 3D model of the S. cerevisiae genome at interphase. The TRMs were explored from a new perspective, which integrates functional and spatial information presently available, and addressed the following question: are target genes associated to a common TF (TRM) randomly disseminated within the nucleus, or are they preferentially co-localized? In the literature, this question was only partially answered, focusing essentially on spatial distances between genes coding for TFs and associated targets [15]. Our analysis represents an additional step in this context, reporting all distances between genes that belong to any TRM, as described in the latest release of the YEASTRACT database. Statistical parameters are provided, to quantify the intensity of potential bias observed in distributions of pairwise Euclidean distances calculated between lists of genes. A web tool called 3D-Scere (https://3d-scere.ijm.fr/) was also developed. With this tool, any researcher can retrieve information for all pairs of genes that belong to a list of his/her interest.

Main text

This study extends on two previous analyses of transcriptional regulatory modules, recently presented in the literature. The first one is that of Monteiro et al. [2], published in 2020. The authors assessed the regulatory features of the current transcriptional network of S. cerevisiae, taking advantage of the latest release of their YEASTRACT database, which comprised almost 200,000 interactions including 220 TFs and 6886 target genes. The second one is that of Sun et al. [15], published in 2019. The authors used the 3D model of S. cerevisiae genome proposed by Duan et al. [12], to study the spatial organization of the regulatory network of S. cerevisiae. We propose that the perspectives and the data from the two studies can be elegantly combined to increase the scope of the results presented in each of them. Indeed, both studies have strengths, but also limitations. On one hand, the study of Monteiro et al. is based on a colossal work to collect, clean, and organize the transcriptional regulations identified in more than a thousand publications in peer-reviewed international journals. Notably, the authors also provided confidence level information for each regulation, thus delivering very high-quality data. They observed interesting topological properties of the global S. cerevisiae transcriptional network and discussed the complexity of the transcription regulatory processes that control gene expression. In that respect, searching for a potential role of genome organization in the functioning of this network, represents a natural perspective. On the other hand, Sun et al. had the original idea to place the transcriptional regulations between genes in the context of the 3D genome model available in S. cerevisiae. They concluded that “the transcriptional regulatory network of S. cerevisiae presents an optimized structure in space to adapt to functional requirements”. Undoubtedly very promising, we think that this conclusion (i) suffers from the use of transcriptional regulations, which were only partially verified and (ii) lacks individual analyses of TRMs.

The work presented in this article was performed in three steps. First, the TRMs were extracted from the study of Monteiro et al. [2]. The supplementary data provided all the YEASTRACT transcriptional regulations, annotated according to “binding evidence”, “expression evidence” or “both”. We decided to focus on regulatory associations which relied on “binding evidence” only. They represent 176 TFs with 6475 target genes, connected with 45,209 associations (23% of the full regulatory associations dataset). Second, the 3D model from the study of Duan et al. [12] was recovered. In the related supplementary data, the 3D coordinates for 26,538 “points” were found. Each point can be seen as a precise location in space, defined by 3D-coordinates (x, y, z). All together the points define all chromosomes of S. cerevisiae genome (Fig. 1a). Each chromosome was arranged into pairs of successive points, which thus delimit chromosomal regions in space. Note here that the obtained regions were of variable sizes because the points in the initial 3D model were not equidistant. We for instance observed that in situations where chromosomes are folded or change direction in space, more points were present to model the same length of DNA base pairs. Tridimensional coordinates for 9185 S. cerevisiae genome features (including 6572 ORFs) were next derived (Additional file 2) and used for calculations of spatial Euclidean distances between all pairs of genome features (this represents 42,177,520 distances) (Fig. 1b). All distance calculations are available as Additional file 3. For each TRM defined by the 176 different TFs, pairwise distances between target genes were selected. Distance distributions obtained with all features of the S. cerevisiae genome and with the subset of genes that belong to a particular TRM were finally superimposed and used to quantify a potential bias for co-localization (smaller distances) between target genes in TRMs. All results are available as Additional file 4. A Kolmogorov Smirnov (KS) test with a Bonferonni correction to quantify the deviation from the distribution of all genes was performed. As a result, several TFs for which the target genes exhibited atypical locations within the nucleus were observed. These TFs are listed in Table 1, and the distance distributions of the four TRMs with the highest KS statistic (i.e. highest deviation from the distribution of all targets) are shown in Fig. 2. An interesting situation, regarding the Upc2 transcriptional module, is detailed in Additional file 6.

Fig. 1
figure 1

Spatial organization of the 16 chromosomes of S. cerevisiae. a Screenshot of the 3D genome model available in the 3D-Scere tool. b Density histogram (light blue) of all Euclidean distances between all chromosomal features in the 3D model (see “Main text”), and Cumulative Distribution Function (CDF, dark blue) of the density histogram

Table 1 Statistical parameters derived from the study of the organization of transcriptional regulatory modules based on a 3D model of the Saccharomyces cerevisiae genome
Fig. 2
figure 2

Examples of transcriptional regulatory modules in which targets are preferentially co-localized within the nucleus. Distance histograms (pink) for the targets of four TFs (STB4, AZF1, MOT3 and UPC2) are shown and compared to the distance histogram of all distances (light blue) as presented in Fig. 1. These TFs were selected because (i) they have a number of target genes > 30, (ii) they exhibit the highest values of Kolmogorov Smirnov statistics, with (iii) associated significant adjusted p-values (< 0.05)

Finally, an open-source tool was developed, for interactive visualization and exploration. Source code is available on GitHub https://github.com/data-fun/3d-scere and the tool is freely usable online at https://3d-scere.ijm.fr/. It allows the visualization of any list of genes in the context of the 3D model of S. cerevisiae genome (Additional file 5 for screenshots). Further information can easily be added, like functional annotations (GO terms) or gene expression measurements. Qualitative or quantitative functional properties are highlighted in the large-scale 3D context of the genome with only a few mouse clicks.

Limitations

We see in this work three main limitations. The first one concerns the biological relevance of the 3D model of the S. cerevisiae genome that was used. Created more than 10 years ago [12], this structural model represents only a static (and averaged) view of the relative positioning of the 16 chromosomes in the nucleus at interphase. It was obtained from 3C experiment data, which had to be processed with complex numerical procedures, to find an optimal solution. Because “optimal” does not guarantee “real”, all observations that emerge from this model must be further validated. In that respect, new data generated with the latest and most powerful Hi-C techniques, at different stages of the S. cerevisiae cell cycle to capture the dynamics of its genome organization could be of great interest. The second limitation concerns the lack of landmarks for the localization of genes, within the nucleus. Are they located near the nuclear envelope and possibly near pores allowing, for instance, the rapid export of transcripts to the cytoplasm? Such information is presently missing from our analyses. One solution could be to calculate additional distances with referential points on chromosomes such as centromeres, telomeres, or the outside emblematic region of rDNA repeats. Finally, the third limitation, in our point of view, relies on the definition of TRMs by themself. We defined a TRM as a set of genes for which the expression is modulated by a common TF. In this work, we reasoned by individual TRM. But a target gene can belong to several TRMs and also can require, to be transcriptionally regulated, the association between several TFs. Such genes could be studied specifically for particular co-localizations on the 3D model of the S. cerevisiae genome. Our strategy thus opens interesting research perspectives in the context of the study of gene lists that belong to transcriptional modules, but it can be of interest for any list of genes. The spatial proximity could be studied, between strongly (or weakly) expressed genes, or between genes which encode proteins involved in common metabolic pathways or which associate within complexes, etc. In this context, the online tool (https://3d-scere.ijm.fr/) will be of interest to the community, allowing any researcher to query any list of genes for which he/she has a particular interest in.