Background

In natural, medical or engineering environments, bacteria often exist as sessile communities called biofilms [1], which are exquisite structures caused by a genetically programmed developmental process in which each stage entails dramatic modifications at the genetic, biochemical, and phenotypic levels [2]. This phenotype enables bacteria to adhere and anchor to surfaces in aqueous environments [3], so the cells acquire specific advantages when invading tissues, such as antiobiotic and shear stress resistance. It is estimated that biofilms are involved in 65% of human bacterial infections [4], since cells in biofilms are 1000 times more resistant than cells in the planktonic state, making medical treatments fail [1].

Many authors [5] have identified five steps in biofilm formation: (i) reversible attachment, (ii) irreversible attachment, (iii) maturation-1, (iv) maturation-2, and finally (v) dispersion [6]. Each step requires reprogramming of gene expression, and this reprogramming occurs in response to environmental changes [7, 6]. The full development of the biofilm includes the existence of a three-dimensional structure made of a polysaccharide matrix that contains water channels for transporting nutrients and removing waste [5, 7]. Different organelles play big roles during biofilm formation at different stages on the bacterial surface, e.g. during reversible attachment flagella propel cells toward the surface to overcome electrostatic interactions [6]. The irreversible attachment process requires the cells to lose their flagella and develop adhesive organelles such as curli or fimbria to attach to the surface [1]. Finally, the production of colanic acid capsules allows the three-dimensional structures of the mature biofilm to be constructed [8]. For specific organelles to appear at each step in the proper order, expression of these organelles must be coordinated and thus regulated by a subset of external signals, regulators and secondary messengers, e.g. flagellum biogenesis requires the positive regulator FlhD/FlhC (FlhDC), environmental conditions such as appropriate temperature, osmolarity, and pH, the presence of acetate and the transcription factor HdfR, etc. [6]. Many types of genes with different functions seem to be involved in biofilm formation [9].

Recently, there has been a dramatic upsurge of research on biofilms aimed at preventing or controlling their formation or eradicating them [10], since they are often deleterious and more complex to treat than planktonic forms owing to their high resistance to antibiotics [2]. It is therefore important to understand the genetic basis of biofilm formation in order to find effective ways to prevent it. Whole genome profiling for each stage provides invaluable information about the underpinnings of the regulation process [1, 11, 5, 4]. Domka et al. [12] compared the gene expression in cells forming biofilms and suspended (planktonic) cells over time in E. coli K-12.

The gene expression profile could be further interpreted to elucidate the gene regulation network and variations in its topology over time. Once established, mathematical models can be used to predict the system dynamics and understand it deeply. These models are based on different approaches such as conservation mass balance and the mass action law, and involve the use of ordinary differential equations, or stochastic kinetics in cases where it is no longer possible to apply mass action assumptions. However, these techniques demand extensive development if a unique solution is required, and this kinetic background cannot easily be obtained [13]. For this reason, approaches that do not demand so much information such as Boolean based network kinetics constitute an ideal way of gaining a deeper understanding of the dynamics of gene networks, because this technique just requires the topology of the network for different time points [1416].

The value of a Boolean logic network modeling resides in translating a continuous to a discrete dynamic system [13]. However, this discretization is only possible when each node response can be described by binary variables [14], which is the general case for gene transcription regardless of the biological system. Boolean networks allow regulatory networks to be modeled and analyzed efficiently, making strong simplifying assumptions about the structure and dynamics of a genetic regulatory system [17]. In this model each cluster (a group of genes) at a given time can display one or other of two states, on or off. The expression of a cluster A at time t + 1 is modeled by a Boolean function, whose entries are the expression at time t of all K clusters related to cluster A. Generally, KN where N is the total number of clusters obtained. After a succession of times, the system traces its history in a space of states [14].

Domka et al. [12] compared the differential gene expression over time when E. coli form biofilms, and gene expression between cells in suspension and biofilms. However, this study was limited to determining the proportion of genes induced and repressed between 2.5 and 5 fold, without identifying the most important networks or pathways for biofilm formation. Therefore, the aim of the present study is to build a network model of biofilm formation in E. coli K-12, to identify clusters and regulators during biofilm formation and to analyze the dynamics of each cluster to corroborate the results previously obtained through whole genome profiling.

Results and discussion

Ten clusters for four time points were obtained with similarity levels greater than 82%. We then identified relevant clusters regarding biofilm formation for E. coli (Table 1) to calculate the weight matrix (Table 2). The connection between expression (model) and phenotype (biological conclusions) is based on the expression and presence of the positive regulators (clusters A and D).

Table 1 Main features of each cluster in biofilm formation
Table 2 Weight matrix for biofilm formation

Overall, we found no effect on the expression trend for the different initial conditions (Table 3) in Clusters E, G, H, I & J as they reached the active state, regardless of the initial population (Figure 1). This result corroborated the importance of such clusters for the whole biofilm formation process in basic cell functions such as replication, transcription, translation and respiration.

Table 3 Initial value for concentrations of gene products of the network
Figure 1
figure 1

Profiles for the ten clusters. Plots for the ten clusters (A-J) show the expression profile (y-axis) vs. time (x-axis) for three scenarios: condition I (-), condition II (-.) and condition 3 (...). Biofilm is not formed under the null scenario because the biofilm positive regulators (clusters A and D) are repressed [expression value (ev) = 0] for four negative regulators (clusters B, C, G and H) [ev = 700]. Quorum sensing (cluster F) is at basal level [ev = 500].

We first observed the gene expression dynamics with non-zero initial conditions for all clusters. Interestingly, the global positive regulator FlhDC cluster, capable of regulating the expression of curli and flagella, was found to be repressed after 0.4 s, possibly indicating a negative effect from the FlhD repressors OmpR, RcsCD, hdfR and LrhA. FlhD behavior was also propagated for the H-NS and QseBC positive regulator clusters (Figure 1). Overall, these results suggest that the simultaneous presence of all clusters does not allow biofilms to form owing to repression of key clusters at the onset of the process (clusters A, D and F) [6, 1824]. Consistent with experimental data, cluster F, which is quorum sensing-related, does not follow flagella and H-NS regulator behavior because it is not strongly coupled with OmpR and RcsCD. Phenotypically, these results suggest that quorum sensing requires the strong action of the flagella apparatus in order to play a role during activation; these results were previously shown by González Barrios et al. [4].

In silico knockouts were also carried out in order to corroborate the phenotypes previously reported for critical genes during the different stages of the biofilm formation process. First, we wanted to evaluate the effect of deleting the EnvZ/OmpR regulator (cluster B) in order to establish a correspondence between this regulator and the master flagella regulon and determine the potential formation of biofilm when the cluster was knocked out. Overall, we again found no response for biofilm formation since the presence of RcsCD, hdfR and LrhA, which are also reported to be negative regulators in this context, possibly causing the absence of the expression of cluster A, the master flagella regulon (FlhDC), the H-NS regulator (cluster D) and the quorum sensing system (Cluster F). Regarding the initial conditions for this knockout, we noticed that even though the system displays no positive response in silico when none is present under the initial conditions, FlhDC almost reaches a positive value, possibly indicating that the initial absence of cluster B could lead the cells to establish biofilm depending on the time response; in other words, the lag time for RcsBC to reach the active state when cluster B is forced to remain shut down. This suggests synergism among the three repressor systems in order to avoid the formation of the biofilm (Figure 2). Moreover, we also found the same results when cluster C (RcsBC) was deleted (data not shown), corroborating this hypothesis of synergism.

Figure 2
figure 2

Profiles for the ten clusters with virtual knockout in cluster B. Plots for the ten clusters (A-J) show the expression profile (y-axis) vs. time (x-axis) for three scenarios: condition I (-), condition II (-.) and condition 3 (...). Biofilm is not formed under the null scenario because the biofilm positive regulators (clusters A and D) are repressed [expression value (ev) = 0] for three negative regulators (clusters C, G and H) [ev = 700]. However, under scenario II, the biofilm machinery is shown to be activated at 4000s.

Two additional knockouts were analyzed with the aim of identifying the expected positive response from the system. First, we shut down the repressor clusters EnvZ/OmpR and RcsBC, as this has been previously reported to inhibit the flagella response. In this case the master flagella regulon, FlhDC, demonstrated that the biofilm formation is activated and these responses are rapid when the stable state is reached (additional file 1). Also, the major role of the RcsCD and OmpR clusters in regulation was corroborated [6] as these genes remained inactivated, in contrast to hdfR and LrhA. Nevertheless, the fact that A, B, C, D and F present a zero initial condition leads to inactivation of H-NS and QseBC because there is insufficient of the activator FlhCD and a negative phenotypical response (no biofilm formed). Therefore, the biofilm regulation network demands positive action from the inductor clusters or genes directly involved in the positive response such as FlhDC, and the cooperative interaction of the negative regulators [6]. In order to corroborate this conclusion, we obtained the profiles for the cases in which a virtual knockout in cluster A was made and clusters A, B and C were knocked out, and in both cases no biofilm was formed (results not shown). It is important to highlight that virtual knockout of hdfR and LrhA could not be done because these are congregated in fundamental clusters.

Finally, we modeled the case when, in a community of cells that have already formed a biofilm, a signal that activates the OmpR regulator and then RcsCD suddenly arrives (results not shown). The profile displays a biofilm formed until 7000 s and then OmpR begins to produce enough protein to inhibit biofilm formation, concomitantly with RcdCD at 15000 s. Although OmpR starts at 7000 s, biofilm formation (cluster A, D profiles) does not begin to disappear immediately but around 11500 s. This is because OmpR expression is initially absent, and genes for surface organelles are not affected during the time necessary to raise their protein expression.

Conclusions

In this work we have built a network model of biofilm formation in E. coli K-12, using cluster analysis and Boolean network kinetics, obtaining phenotypes already reported. Throughout the modeling, we have used the most important regulators of biofilm formation (FlhD, H-NS, QseBC, OmpR, RcsCD, hdfR and LrhA) under different initial conditions, and obtained enough evidence to demonstrate that networks can efficiently model a complex gene regulatory system such as the biofilm formation in E. coli K-12. As in nature, the model simulated successful biofilm formation when the most significant negative regulators (OmpR and RcsCD) are inactive, and when the positive global regulator FlhDC is active.

Methods

Data

The Affymetrix Microarray data used were obtained from the NCBI Gene Expression Omnibus (GEO accession: GSE3905) [25, 26]. These data were originally obtained by Domka et al. [12]. The data included four time points: 4, 7, 15 and 24 h for cells presenting a biofilm phenotype and for cells in the planktonic state. For this purpose, they grew the two groups of cells in the same reactor in order to eliminate errors associated with different environment conditions. Each set of data included the level of expression of 7312 genes, and in total 8 sets of data were obtained [12].

Gene clustering

Whole genome profiling allows cluster analysis to be carried out so the genes and clusters that play important roles at each stage can be elucidated. Clustering of gene expression was performed using MultiExperiment Viewer v4.38 [27]. The K-means/medians clustering support algorithm was used to make 10 clusters for each time point with the expression levels for biofilm and planktonic cells. The number of clusters was calculated using the Figure of Merit encompassed in the software, and Euclidean distance was used as the metric distance. The clustering process was restricted to eighty iterations and twenty K-means/K-medians with a threshold percentage of occurrences in same cluster of 0.8. We compared the level of similarity, confirming that most genes were clustered with the same genes for all time points, in order to obtain ten merged clusters.

Model description

A network model is useful in applications where the information relevant to the problem to be solved is scant or incomplete, but where data are available. Dynamic modeling gives the relevant information about time responses that the ordinary Boolean approximation cannot provide. We then employ an enhanced Boolean network in our model, which reduces the need for large sets of data for network training and displays the dynamic evolution of the network. In the network model, the rate of expression of a gene i is given as [17]:

(1)

Where the first term quantifies the regulatory effect of other genes and the second term describes the degradation. k 1i and k 2i are the rate constants. The decay rate constant k 2i can be expressed through the protein half-life t 1/2 as:

(2)

The regulatory effect of gene f i , can be represented as follows:

(3)

Where τ i is the bias (used as a delay parameter for the reaction), y j represents the concentrations of gene products of the network, and w ij is the correlation coefficient matrix, called the weight matrix. This matrix describes how various genes in the model affect each other's expression patterns over a period of time. A strong positive correlation indicates that the genes may be co-expressed and have a value around one, and a strong negative correlation indicates that the genes may inhibit each other's expression and have a value around negative one. Therefore the Boolean nature of this model is conserved. Nevertheless, this approach constitutes an improved version as it involves the dynamics of the process. The weight matrix was obtained from different microarrays aimed at studying gene regulation for the E. coli K-12 biofilm formation process [2, 47, 12, 2833]. In order to keep track of the genes that play a major role according to the literature, we first determined the interaction among them over time, localizing them in different clusters. We only considered the most relevant genes because these correlations are taken from literature and their functions are well known.

RNA mass balance allows the gene expression profile to be described in the following manner:

(4)

We solved the system of ordinary differential equations with the MATLAB® platform using a fourth order Runge Kutta algorithm, with absolute and relative tolerances of 1.0e-006 and 1.0e-003, respectively. To reduce the dimensionality of the solution space we assumed a single time delay τ i = τ for every regulatory interaction. Kinetic parameters were determined based on Gupta et al. [17] (τ = 1, t 1/2 = 800s, and k 1i = 0.6 mol/s). Simulations were performed using 2.64 GHz Intel Core 2 Duo Processor T7500, 2 GB RAM. (Average time per analysis was 30 s).