Background

Even in times, when Systems Biology is coming close to simulating complete organisms in silico [1], bacterial gene regulation is far from being understood. In order to organize gene expression in time and space, i.e., in the right order and quantities across chromosome locations, several evolutionary targets are plausible:

  1. (1)

    long-range interactions between genes implemented by dedicated transcription factors (TFs),

  2. (2)

    stabilization of mesoscopic or macroscopic DNA structures and

  3. (3)

    binding site affinity.

Just as regulation by TFs, nucleation of large-scale chromosome structures require specific DNA sequences. Dynamic DNA loops, conducive to transcription, and stiff, tightly interwound regions that are largely inaccessible, are induced and stabilized by nucleoid associated proteins (NAPs). The local shape of the binding energy landscape therefore has a direct impact on the global chromosome conformation.

Furthermore, the gene expression patterns serve cellular functions. Qualitatively speaking, metabolic function largely dictates the requirements for the spatiotemporal organization of gene expression. The patterns of expression changes have evolved to match the metabolic needs of the organism under the conditions at hand.

Over the last ten years a wide range of studies provided first evidence that all these components are strongly interlinked and shape the observed gene expression patterns (see, e.g., [24]). The interplay of genome architecture and global regulators [5], the genome’s spatiotemporal organization [6], as well as the relationship of sequence information and genome architecture [7] are challenging to understand.

On a more general level, the task of comparing dynamical information, e.g., time series available for each node, with a given network occurs in different variations in a wide range of disciplines: In neuroscience the problem of relating functional connectivity, i.e., a network derived from correlations in node activities, with structural connectivity, e.g., a network of cortical areas connected by fibers, is currently of high relevance [810]. In systems biology, the comparison of metabolomics data with metabolic networks belongs to the same class of statistical problems (see, for example, [11, 12]). A related question is network inference, i.e., how to estimate interaction networks from dynamical data. For gene regulatory networks this is exemplified in [13, 14] and for the reconstruction of microbial interaction networks from microbiome compositions in [15]. In this context, the whole research field of Bayesian networks needs to be mentioned [16]. A recent review of a wide range of such network reconstruction, inference, and modeling methods is given in [17].

At the core of many of these research directions are methods from the statistical physics of complex networks since the organization of dynamical processes on graphs belongs to the most prominent research questions in this field (see, e.g., [1820] for reviews). In our investigation we resort to essentially the same set of concepts and methods: The study of random Boolean networks has a rich history in statistical physics (see, e.g., [21]), even before it gained such recent prominence in systems biology [2224].

The results of a wide range of our own investigations over the last years have their foundations in statistical physics and information theory [2, 4, 2531]. They have cemented the notion of a tight interplay of the regulatory network implemented via TFs and their binding sites (digital control); and the regulation implemented via alterations of chromosomal configuration and DNA compaction (analog control) in bacterial gene regulation (depicted in Fig. 1). Our findings show that this interplay is not only revealed by the dynamical quantities, i.e., the control strengths [25, 32], but also clearly visible on the structural level [33]. The distribution patterns of genes under regulation by TFs and of genes without (known) regulation by TFs are fundamentally different. Using methods from point process statistics, we observed systematically shorter distances among the latter class of genes, suggesting a higher importance of (distance-driven) analog control [33].

Fig. 1
figure 1

Schematic illustration of digital and analog control. The different levels of system information. Part (a) shows the positions of genes/operons on the circular chromosome and the regulatory interactions mediated by transcription factors. These are analyzed separately in terms of the transcriptional regulatory network (b), which contains all interactions mediated by transcription factors, and the gene proximity network (c) contains links between two genes/operons if their distance on the chromosome is smaller than a certain threshold. Transcriptome data are then mapped onto these two networks in order to quantify the corresponding strengths of digital and analog control (d)

In [27] the two arms of the chromosome from chromosomal replication origin (OriC) to terminus (Ter) have been established as an appropriate coordinate system to interpret the spatiotemporal changes in gene expression. When going to a more localized version of the formal control strength, strong shifts of importance between digital and analog control are observed across different cellular conditions and growth phases. Generally, analog control is organized along these linear chromosomal segments from OriC to Ter, while digital control is organized across these segments.

Finally, by applying the methods of quantifying digital and analog control to the subset of gene expression changes consistent with predicted metabolic flux patterns (via flux-balance analysis, see [33]), we found that these expression changes, which are apparently instrumental in generating a coherent metabolic state, are predominantly under analog control.

In [32], we established methods for quantifying the amount of digital and analog control contained in a set of significantly, differentially expressed genes for the bacterium Escherichia coli. In that way, we have discovered a tight interplay between digital and analog control governing gene expression patterns in response to diverse perturbations of the gene regulatory machineries.

Our general view is that gene expression data can be decomposed into ‘sub-patterns’ compliant with (or dictated by) a certain regulatory mechanism, like the transcriptional regulatory network (TRN) or DNA topology. The distinction of (and coupling between) digital and analog control has also been supported by a statistical analysis of gene locations [33].

In order to further test and validate the previously formulated hypothesis of a buffering between digital and analog control [32], we have measured gene expression time courses for wildtype E. coli and two mutants. This allows us to verify the previous findings based on a more modern platform, RNA-seq, and to include the temporal dimension in our, now continuous, assessment of digital and analog control.

Before applying the new methods to real expression profiles, we assess the performance of digital control strength quantifiers using the framework of random Boolean networks. Lastly, we discuss the implications of our findings for our understanding of bacterial gene regulation.

Methods

Cell growth conditions and mRNA isolation

The E. coli CSH50 Δ fis and Δ hns strains were grown in 4 L of double rich medium (dYT) in a fermenter under constant pH 7.4 and high aeration (500 rpm stirring, 5 L air per min) at 37 °C. The culture was inoculated from 16 h overnight cultures at an initial OD 600 of 0.1. Cells were grown for 7 h and samples for RNA-seq were taken at 1, 2, 3, 5 and 7 h after inoculation (see Table 1). Each sample was immediately dissolved in ice-cold ethanol-phenol (5 % phenol) solution to prevent mRNA degradation. RNA was extracted using the RNeasy Mini kit (Qiagen) and treated with Turbo DNase (Life Technologies). Subsequent rRNA depletion was carried out using the MicrobExpress kit (Life Technologies) and 0.5 μg of enriched mRNA of each sample were sent for RNA-seq (Illumina HiSeq 2000).

Table 1 The table denotes the time in minutes after inoculation of E. coli cultures in fresh medium when the cells were harvested for sequencing. It shows this information for the wildtype (wt), fis and hns mutant

Gene expression analysis

The 50 bp Illumina HiSeq reads were mapped on the E. coli MG1655 genome (NCBI). Chromosomal repeats were masked. Gene expression was determined by normalizing the coding sequence (CDS) reads with the total number of reads as well as the length of the CDS. The expression values of each gene at a given time point were normalized to the sum of expression of all genes at that time point. The expression curve of each gene was subsequently interpolated by a natural spline method. Resulting expression curves were verified by fluorescence measurements of yfp-coupled promoters exhibiting different temporal patterns. The wild type data was first published in [31]. The fis and hns mutant strain data is new to this work.

Although RNA-seq generally allows for a higher confidence in its outcomes than microarray data, the unequal spacing of the time points and a lack of replicates for the time points pose a challenge to adequately handle the data. In the original method [32], significantly differentially expressed genes were determined by T-test and then used to determine the number of connected versus disconnected nodes,

$$ R = \frac{N \left(k > 0 \right)}{N \left(k = 0 \right)}. $$
(1)

Assessing the differentially expressed genes in the current data set is much more difficult, however. It is impossible to assess the variance; and applying an approximate fold change of 2 per hour is not possible since more than just the exponential growth phase is observed and the time intervals are uneven. Additionally, the mutant strains have one less time point of data collection. Since RNA-seq data is more reliable, however, we decided to simply use the continuous expression levels directly.

Transferring continuous expression levels to the aforementioned method is relatively straightforward. We are generally only interested in the response of a certain gene and not the absolute value. In addition, all genes should be comparable within a network. For this reason the expression levels of each gene over time were normalized between zero and unity.

There are interpolated expression levels available for almost all genes such that the basis for computing analog or digital control are the complete GPN or TRN, respectively. It is also clear that there are certain expectations for the distribution of relative expression levels. When considering the chromosome structure, it is obvious that genes within a transcription unit (TU) must have the same expression level. Similarly, all genes within one operon should have approximately the same expression level. For this reason we convert the networks under investigation to an aggregated form where nodes are no longer genes and TFs but operons with mean relative expression levels of all genes involved. The links in the network are then from operon node to operon node if there is a link from a gene or TF in one operon to a gene in another operon.

Instead of counting connected differentially expressed genes versus disconnected ones, as in Eq. (1), we now compute a control strength per link that takes into account the relative expression level of each node involved. The sum over all terms per link is then normalized by the total number of links. The continuous control ratio R cont. as its name suggests is thus no longer discrete.

$$ R_{\text{cont.}} = \frac{\sum_{(j, i) \in M} C(i, j)}{|M|} $$
(2)

The pairs (j,i) are part of the existing set of links M. The TRN and GPN used for the subsequent analysis are listed as Additional Information: Additional file 1: GPN, gene level; Additional file 2: GPN, operon level; Additional file 3: TRN, gene level; Additional file 4: TRN, operon level.

Control type confidence

In order to evaluate whether an observed control strength is unexpectedly high or low, in [32], a set of control strengths was computed for comparison. Those control strengths were computed in a population of networks that was generated by randomly choosing the same number of nodes from the complete TRN or GPN as selected by the differentially expressed genes. From those data a z-score was computed which is the control type confidence (CTC) discussed previously.

$$ \text{CTC} = \frac{R_{\text{cont.}} - \mu_{R_{\text{cont.}}}}{\sigma_{R_{\text{cont.}}}}, $$
(3)

where \( \mu _{R_{\text {cont.}}} \) and \( \sigma _{R_{\text {cont.}}} \) are the mean and standard deviation of R cont. in the random sample. Here, we proceed similarly but a few distinctions are of note. The RNA-seq expression data include most every gene in the TRN and GPN. In order to introduce any randomness at all and since we use the continuous expression levels, we simply shuffled the index i of all genes (or transcription units, or operons). This resulted in each node of the TRN or GPN ending up with different expression levels. Considering that we actually have an expression matrix over all genes and time points, and considering that each gene’s expression profile over time was normalized between zero and unity, we always shuffled and kept intact the entire time series. We measure the control ratio R cont. differently in the case analog or digital control.

Absolute control strength

Computing analog control strength is the more obvious scenario. Links are based on proximity of genes and we expect neighboring genes to have similar relative expression values due to a high likelihood of being located in the same region of analog control. It seems sensible to regard relative expression levels at the same time point and compute the control strength of a link as follows:

$$ C_{A}(i, j) = 1 - \left| e_{i} - e_{j} \right|. $$
(4)

That means, there is a high absolute control strength when relative expression levels e i and e j of the neighboring genes i and j are of similar magnitude. For the undirected GPN the distinction of direction has no impact.

Functional control strength

The regulatory function of links in the TRN and thus digital control present much more of a challenge to measure appropriately. The regulatory interaction is mediated by transcription factors which may be liable to any of the following situations:

  1. (i)

    Transcription factors are proteins, that means, they are the result of first transcription, then translation and potentially post-translational modification (PTM).

  2. (ii)

    They may also depend on co-factors for the right conformation.

  3. (iii)

    All these steps may cause a time delay between activation of gene i and a regulatory action at gene j and strongly depend on copy numbers and binding dynamics of the TF.

Obviously, Eq. (4) is a poor approximation of digital control. We can improve it by taking into account the regulatory function of a link. We ignore links that have an unknown or dual role since only a handful of each exist anyway. The following function is applied separately for links A ij that are activating or inhibiting. The matrix A is a variant of the adjacency matrix, which also incorporates the type of the link. The element A ij , denoting the influence of the jth gene on the ith gene, has the following possible entries: 0 (no link), 1 (activating) and −1 (inhibitory).

$$ C_{F}(i, j) =\left\{ \begin{array}{ll} 1 - \left| e_{i} - e_{j} \right|, & \text{if} ~ A_{ij} = 1 \\ \left| e_{i} - e_{j} \right|, & \text{if} ~ A_{ij} = -1 \end{array} \right. $$
(5)

Random Boolean network model

In order to test and ‘calibrate’ the statistical methods described above, we use a simple model, random Boolean networks, for generating artificial gene expression data. We generate a random Erdős-Rényi (ER) graph with N nodes and M directed links. After that, M A links are declared activating, while the other M I =MM A links are declared inhibitory. Following the prescription from [23], we then add self-inhibitory links for all nodes which have no incoming inhibitory links. Again, in accordance with the model from [23], the following update rule is used to generate dynamics,

$$ {x_{i}}(t + 1) =\left\{ \begin{array}{ll} 1, & \text{if} ~ \sum\limits_{j} {{A_{ij}}{x_{j}}(t)} > 0 \\ 0, & \text{if} ~ \sum\limits_{j} {{A_{ij}}{x_{j}}(t)} < 0 \\ {x_{i}}(t), & \text{if} ~ \sum\limits_{j} {{A_{ij}}{x_{j}}(t)} = 0 \end{array} \right., $$
(6)

where x i (t) denotes the binary, 1 = ON- or 0 = OFF-state of the ith gene at time t.

After a small number of time steps (typically five), such an RNB will settle into a fixed point or a cyclic attractor. These asymptotic states are ideal for analysis with the classical digital control strength defined in [32]. In order to have a longer time series of continuous values available to our analysis, we simulate a large number of short time series (thus sampling transients leading towards these asymptotic states), concatenate them and then compute the average activation of a gene in a certain time window. We use 104 short time series, each 10 time steps long and average activity over windows of 103 time steps.

The main assumption of the minimal model we used to calibrate our quantification methods is that on a microscopic time scale, ON and OFF are meaningful states of individual genes. We thus neglect (at such a microscopic level) a more gradual description of gene activity. In our RBN formalism, continuous gene expression levels arise as time averages: On this coarse-grained time scale, a high ’expression level’ means that in a given time window the gene under consideration has been very often in the ON state. A second processing step (in addition to averaging the binary gene states over time windows) is the normalization of the data. In principle, two normalizations are possible here: (1) normalizing the sum over all time points for each individual gene; (2) for each time point, normalizing the sum over all gene activities. In order to match the decision made for normalizing the experimental data (time course of each gene normalized individually) we here select the first normalization variant.

Results and discussion

On the methodological side, the main result of our investigation is to provide and test a new method for analyzing time-resolved gene expression data. With the wide availability of next-generation sequencing methods, high-quality time-resolved data are now becoming rapidly available for diverse biological and biomedical situations.

In the case of time-resolved data, several extensions of the original method from [32] (see also Eq. 1) are possible. For measuring analog control type confidence (CTC) we will use the absolute control strength C A (Eq. 4). This choice is motivated by the expectation that in a neighborhood structure like the gene proximity network (GPN), most genes will be within the same domain of analog control and are expected to have very similar expression levels.

For digital control, represented by the TRN, the situation is less clear. Each link has a regulatory function, an activating or inhibiting interaction, and in reality this function is performed by a transcription factor which may introduce a delay between a signal arriving at one gene and the result of its regulatory function at another gene. Hence, we decided to test these digital control extensions using synthetic data.

Simulated expression data

In order to understand the capability of each of the four definitions of control strengths for time-resolved data (see below), we employ numerical simulations using random Boolean networks (RBNs) to generate synthetic data sets in which all systematic information is generated by the underlying regulatory network. RBNs have been very successful in describing the dynamics of small-scale regulatory systems (see, e.g., [22, 23, 34]). Variants of RBNs have also been used as minimal models of signaling pathways [24, 35]. We therefore expect a very high match between the ‘transcriptome data’ and the network architecture, i.e. high CTC values (see definition in Eq. 3).

Clearly, real transcriptome profiles differ dramatically from the output of such RBNs. Gene expression data are not binary (ON or OFF) but have a broad distribution of values. Also, the actual switching events arising from the regulation are more gradual and also affected by a multitude of other factors beyond the regulation by transcription factors, for example, external stimuli, noise or signaling.

We have therefore applied a sequence of processing steps to the binary data, which are intended to mimic some of those effects. By running the RBN for a few time steps starting from random initial conditions and then putting these runs into a longer time series, we have a stylized version of a system kept in a perpetual transient due to external influences. The details can be found in the Section “Random Boolean network model”.

The networks are equal in size to the latest TRN published by RegulonDB [36] with N=1791 genes, M A =2453 activating and M I =2095 inhibitory interactions. Additionally, in accordance with the scheme described in [23], inhibitory self-loops are introduced at nodes that have only incident activating links. Whereas the TRN has a broad, heterogeneous degree distribution, the RBNs’ degree distributions resemble a normal distribution. A sample network and time series of ON/OFF states can be found in Additional file 5: Figure A.1.

Arriving at a RBN prediction of the continuous ‘expression level’ of a node required normalizing the sum of its ON states within a window that includes multiple short time series simulated from different random initial states. The larger the window, the broader the possible levels. Even for small window sizes, the range of values is much wider than for a uniform random choice of states of equal dimension.

In this calibration step using RBNs, several technical aspects of the data processing can be manipulated and their effect on the control strengths can be studied. Such technical aspects are the normalization, the number of initial conditions used, the simulation time, the size of the averaging window, and the amount of fluctuations in the simulated ‘data’. The key finding of this calibration step is that the absolute control strength C A (see Eq. 2) is apparently too simplistic to capture the underlying regulatory network, while the three other control strengths clearly identify a high match between the ‘transcriptome data’ and the regulatory network, with all three producing approximately the same levels of CTC values. In our subsequent analyses, we use the simple, best performing functional control strength C F as a measure for the digital component of control. It is described in detail in the Section “Functional control strength”. The full comparison of the different control strengths can be found in the supplements (Section “Definition of control strengths”).

Time-series gene expression data

Before application of the newly defined methods, we present an impression of the time-resolved RNA-seq data measured in the wild type (first published in [31]) and the two mutants fis and hns. The relevant experiments and data transformations are described in Sections “Cell growth conditions and mRNA isolation” and “Gene expression analysis”.

In Fig. 2, we show some examples of gene expression in the wild type. They are either directly relevant to our study, i.e., fis and hns, or known to be active during different phases of the growth cycle. dps is a marker gene of the stationary phase. gyrA and gyrB encode the subunits of DNA gyrase which is a member of the topoisomerase family and can increase DNA supercoiling. In addition, it is required for DNA synthesis and replication fork progression [3739]. rpoD encodes the σ 70 factor which is the major constituent of RNA polymerase during exponential growth. rpoS encodes the σ S factor which is abundant during the transition to the stationary phase. Overall, the changes in gene expression levels along the growth curve of the cell culture (black points) shown in Fig. 2 are consistent with established knowledge.

Fig. 2
figure 2

Interpolated and normalized gene expression levels. a The levels published in [31] are shown here normalized between zero and unity. There are seven examples of gene levels in the wild type: fis, which is prominent in the early to mid exponential growth phase; hns, strongly expressed in the late exponential growth phase;the σ 70-factor encoding gene rpoD which is more active during exponential growth; the σ S-factor encoding gene rpoS, mostly active during the transition from late exponential to stationary growth phase; dps, which is associated with the stationary phase; and the two gyrase-encoding genes gyrA and gyrB. DNA gyrase is important for altering chromosome structure and the progression of the replication fork. b The black dots and the corresponding polynomial fit depict the cell density OD 600 in order to clearly discriminate between the growth phases

Application to real expression data

Our first aim is to qualitatively reproduce the (static) observations from [32]. The results in [32] are based on gene expression data obtained during the exponential growth phase which corresponds to the 60 – 120 min time window in the time-resolved data analyzed here. The main goal of our investigation is to evaluate the hypothesis formulated in [32] of a balancing of digital and analog control.

With these general growth cycle-dependent expression results in mind, we can interpret the outcomes of analog and digital control. Importantly, our results are based on the operon projections of the TRN and GPN (see Section “Gene expression analysis” for details). Figure 3 shows the analog CTC for the absolute control strength C A . The analog CTC measured by absolute control strength C A has a distinct profile. First of all, even the lowest values are above 3 which, in terms of z-scores, is a significant result. This is an indication that the distribution of absolute control in the random realizations of the chosen null model are far away from the control value in the data. Our null model does take into account operon structure but otherwise simply distributes the gene expression values randomly. However, the characteristic peaks of analog CTC in the wild type towards the beginning and end of the bacterial growth cycle match our previous observations presented in Fig. 2. A possibly confounding factor in the earlier rise of analog CTC, around the 220 min mark, in the wild type as compared to the mutant strains, is the extra gene expression measurement after 5 h (see Table 1) which is then incorporated in the polynomial fit to the time series (cf. Section “Gene expression analysis”).

Fig. 3
figure 3

Absolute analog control type confidence (CTC) for real gene expression data. The analog CTC measured by absolute control C A Eq. (4) has a distinct profile. Even the lowest values are above 3 which, in terms of z-scores, is a significant result. This is an indication that the distribution of absolute control in the random realizations of the chosen null model are far away from the control value in the data. There are two characteristic peaks of analog CTC towards the beginning and end of the bacterial growth cycle which matches experimental observations [31]. The earlier rise of analog CTC in the wildtype, around the 250 min mark, is most likely due to the extra gene expression measurement after 5 h (see Table 1) which is then incorporated in the polynomial fit to the time series (cf. Section “Gene expression analysis”)

We only show the functional control C F results for digital CTC since that is the measure suggested by the RBN results. The complete results can be seen in Additional file 5: Figures A.3, A.4 and A.5 but will not be discussed here. Digital CTC based on C F (Fig. 4) has a fairly similar profile as compared to analog CTC but shifted by approximately -50 minutes, a sharper first peak and actually fluctuates around zero which is an indicator of a reasonable null model. The digital CTC values at exactly 120 min are in good qualitative agreement with the result in [32] Figure 3b. This is clearly shown in Fig. 5. Both results show the increasing order of wild type, hns and fis mutant. Effects of the additional measuring point for the wild type are possibly apparent in digital CTC, too. The onset of positive digital CTC around 200 min is much earlier as compared to analog CTC (300 min) and may suggest an initiation of the global gene expression pattern change by the digital component which is followed by the analog (structural) component approximately 100 min later. The trend of digital and analog CTC is directly compared in Fig. 6 which makes it easier to follow the above description.

Fig. 4
figure 4

Functional digital control type confidence (CTC) for real gene expression data. Digital CTC based on C F Eq. (5) has a fairly similar profile as compared to analog CTC (Fig. 3) but shifted by –50 min. It has a sharp first peak and actually fluctuates around zero which is an indicator of a reasonable null model. Effects of the additional measuring point at 300 min for the wild type are fuzzier in digital CTC. The onset of positive digital CTC around 200 min is much earlier as compared to the rise in analog CTC (300 min)

Fig. 5
figure 5

Digital CTC: Comparing current results with those from [32]. Although the magnitude of the individual CTC results vary dramatically, the relative differences between the strains is surprisingly similar

Fig. 6
figure 6

Normalized analog CTC (C A ) and digital CTC (C F ). A direct comparison of analog and digital CTC which reveals the time dependent changes in each type of CTC

Conclusions

For static transcriptome data it has been shown that control strengths are a useful method for evaluating the agreement between the pattern of gene expression and the underlying transcriptional regulatory network [25, 30, 32]. Here we have extended these methods to the time domain and applied them to a novel set of gene expression profiles. The results obtained in a simple model of random Boolean dynamics have helped us to evaluate the different types of digital control strengths considered.

Conceptually, this approach is reminiscent of the comparison of structural connectivity (SC: the underlying interaction graph) and functional connectivity (FC: a network derived from similarities in dynamical behavior among nodes) in computational neuroscience [8, 10, 40]. As in this case of SC/FC correlations, we analyze the ‘effective network’, i.e., the dynamical usage pattern of a given static interaction network.

The results for real time-series RNA-seq expression data show, on one hand, that analog CTC, over the course of the bacterial growth cycle, has two phases when it is strongest. This observation is in accordance with another evaluation of the same data [31]. The first phase, which is around the 60 min. measuring point, coincides with a peak for digital CTC. This suggests that both types of control are involved in shaping the regulatory patterns that are characteristic of the exponential growth phase.

The early peaks of analog and digital CTC (around 40 min) closely coincide with the rise of rpoD and fis expression. The rpoD gene encodes the RNA polymerase major σ 70 subunit forming the vegetative RNAP σ 70 holoenzyme, which transcribes the strong ribosomal operons organized around the chromosomal replication origin (OriC). These operons are activated by FIS during exponential growth and accumulate RNA polymerase in transcription foci formed in the vicinity of OriC [41] and delimiting the chromosomal rrn functional domain [3]. Accordingly, during this stage extensive communications are observed between the functionally related (primarily anabolic) genes across the chromosomal arms in the OriC end of the chromosome [30]. These observations are wholly consistent with the high analog CTC observed during the early stage of growth. At the same time fis acts as a hub regulating numerous genes in the TRN, and so the early activation of fis expression is also consistent with the early peak of the digital CTC.

The second peak of digital CTC (around 200 min) coincides with transition to stationary phase and corresponding the increase in rpoS and hns expression. The rpoS gene encodes the stationary phase σ S subunit of the RNA polymerase involved in transcription of catabolic genes [30], whereas hns acts as a hub in the TRN increasingly binding its genomic targets on transition to stationary phase [42]. The two (early and late) peaks of the digital CTC thus closely coincide with sequential activation of anabolic and catabolic genes, and thus mark the commencement of exponential growth and transition to stationary phase, respectively. Previous findings that the vegetative RNAP σ 70 and the stationary phase RNAP σ S holoenzymes cooperate with different sets of transcription factors [3] are fully consistent with this notion. Moreover, activation of RNAP σ S holoenzyme on transition to stationary phase is associated with DNA relaxation and expression of other abundant NAPs such as IHF, Lrp and Dps followed by morphological reorganisation of the nucleoid and activation of communications between the chromosomal arms in the replication terminus [26, 39, 43], again in keeping with the second peak of analog CTC observed in our study.

The second phase of strong analog CTC is preceded by a rise in digital CTC that occurs at least an hour earlier. This may be an indication of the digital component inducing strong changes in the analog component. We should be able to identify characteristic signal on other levels of the growth cycle experiment. These results are further evidence for the compensatory interplay of digital and analog control and show interesting interactions over the growth cycle of E. coli.

In future studies two steps will be necessary:

  1. (i)

    Consideration of a calibration model that is based both on an artificial digital and analog component rather than just the one.

  2. (ii)

    The type of evaluation performed here should be applied to other time series data and if replicates are available compared with the results for a discrete evaluation based on differentially expressed genes.