A Formal Model for the Simulation and Analysis of Early Bioﬁlm Formation

. Bioﬁlms are structured communities of bacterial cells adherent to a surface. This bacterial state is called sessile . This paper focuses on the modelling of the transition between plank-tonic and sessile state using Real-time Maude as the modelling language. With more and more bacteria joining the sessile community, the likelihood of producing a bioﬁlm increases. Once the percentage of bacterial cells that adheres to the surface reaches a threshold, which is speciﬁc for the considered bacterium species, a permanent bioﬁlm is formed. An important challenge is to predict the time needed for the formation of a bioﬁlm on a speciﬁc surface, in order to plan when the material infrastructure that comprises such a surface needs to be cleaned or replaced. We exploit the model-checking features of Real-time Maude to formally prove that a regular cleaning or replacement of the infrastructure prevents the bioﬁlm formation.


Introduction
Biofilms are microstructured bacterial communities that live at interfaces. They are the most common mode of life for microorganisms in both terrestrial and marine environments. Usually, biofilms thrive at liquid/solid interfaces like the inner surface of water pipes [29] and catheters [28], or in soil and sediments [10]; biofilms play a beneficial role in wastewater treatment, where they increase organic carbon degradation and contaminant removal. However, biofilms harbor pathogens and protect them from antimicrobial agents, thus posing serious threats in health settings. Biofilms consist of bacterial cells encased in selfproduced extracellular polymeric substances, collectively termed biofilm matrix.
The biofilm matrix is composed of carbohydrates, proteins and extracellular DNA. The relative proportion of these components is species-dependent and varies with biofilm age and nutrient concentration [17].
The biofilm life cycle is illustrated in Fig. 1. Biofilm formation initiates when single planktonic cells enter in contact with a surface, usually solid. Following this initial interaction, several steps can be described, including reversible attachment, irreversible attachment, microcolonies formation, and biofilm maturation, in which complex microstructures are formed and the biofilm reach its maximum thickness. In the planktonic-sessile transition, cells lose their flagella and start producing extracellular polymeric substance. The biofilm life cycle ends with biofilm detachment or dispersal, where part of the cells transitions from biofilm to planktonic state and move downstream to seed other sections of the surface. The biofilm life cycle has been validated with microscopy, transcriptomics and metabolomics analysis [30]. (2) Newly attached planktonic cells ("settler" biofilm); (3a) Fully mature biofilm stage (biofilm phenotype); (4) Newly single cells dispersed from the biofilm (newly dispersed phenotype); (5) Detached biofilm aggregates (biofilm phenotype); (6) Reattached biofilm aggregates (biofilm phenotype); (7) Newly dispersed phenotypes cells from the biofilm giving rise to planktonic phenotype cells.
The biofilm early formation is largely dependent on the concentration of bis-(3'-5')-cyclic dimeric guanosine monophosphate (c-di-GMP), a secondary messenger that is ubiquitous in the bacterial world. Enzymes that synthesize c-di-GMP or degrade it are found in great numbers, indicating its importance as a regulator of many bacterial functions. C-di-GMP contributes to determine motility, biofilm formation, and production of multiple proteins in microorganisms. It binds to a large variety of effector components and controls targets involved in transcription and formation of large cellular and extracellular structures.
In the biofilm life cycle, the most important step is the initial attachment, in which the cells transition from free-swimming state (so-called planktonic) to sessile state, in which they attach to solid surface, lose their motility and start producing the extracellular matrix, in addition to the normal growth process [9]. Modelling of initial biofilm formation is important to predict the extent of biofilm growth on removable biomedical devices (e.g., catheters) and drinking water pipes, thus allowing for planned cleaning or replacement of biofilm-contaminated parts and minimising the risk of infections [11]. The effectiveness of cleaning treatment depends also on biofilm concentration, thus the formal modelling of initial biofilm formation will contribute to optimising the frequency and the duration of antimicrobial application in biomedical devices and water systems.
A number of biofilm models have been developed during the last 30 years. They can be roughly divided into continuum and discrete models.
Continuum models simulate biofilms in a quantitative and deterministic way [16]. However, such models may result in high computational complexity, particularly for multispecies biofilms and when considering multiple substrates [3]. Early continuum models focused on cell growth and microstructure formation. Modern continuum models concern the effect of fluid dynamics on colony formation and are validated with time-resolved single cell imaging to reveal important details on cell-fluid interactions at different biofilm ages [23].
Discrete models, on the other hand, are very successful in representing the multidimensional heterogeneity of biofilm, but introduce elements of randomness and stochastic effects into the solutions [16]. Agents-based models, based on several platforms, eventually integrated (e.g., NetLogo, MatNet), are especially popular because of their simplicity and low computational requirements. However, they may become highly sophisticated when integrating multiscale and constraint-based metabolic modelling [7].
Hybrid biofilms models are the most recent. They support the simulation of discrete bacterial cells within a multiphase continuum consisting of extracellular polymeric substance (EPS) and water as separate interacting phases. These models support the prediction of bacterial colony formation. The distribution of bacterial growth and EPS production is sensitive to the pore spacing between bacteria and the consumption of nutrients within the bacterial colony [12].
Although formal methods have been widely used in systems biology [6], to the best of our knowledge, no formal models of biofilms have been reported in the literature. After Pȃun's work on P-Systems [24], rule-based systems have been widely used in the modelling of biological systems, due to the natural way in which they can express chemical reactions and biological interactions. The use of a rule-based systems in combination with a formal analysis tool allows us to generate a model that focuses on the individual bacterial cells, whose behaviour can be defined using simple rules, assuming that the cell properties are known, rather than using a computationally expensive multidimensional continuum. We use Real-Time Maude [19,21], a toolset that comprises a rule-based formal modelling language, which uses rewriting logic [18] to model system state transitions, and high-performance simulation and model checking engines, which support formal analysis.
The rest of this paper is organised as follows. Section 1.1 provides a brief highlight of Real-Time Maude and refers to the sections of the paper where the different aspects of the language are illustrated. The Real-Time Maude code for the model, the experiments and the formal analysis can be downloaded from a GitHub repository 1 . Section 2 presents our approach for modelling a bacterial population and its evolution, the process of biofilm formation as well as preventive interventions. Section 3 illustrates our approach using an example inspired by the information about a specific microorganism, Pseudomonas aeruginosa, which is available in the literature [1,25,31]. It exploits the example to illustrate both in silico experiments and the use of the model-checking features of Realtime Maude to predict the outcome of the experiments and analyse intervention plans. Section 4 concludes the paper with further considerations on our approach and ideas for future work.

Real-Time Maude
Real-Time Maude [19,21] is a formal modeling language and high-performance simulation and model checking tool for distributed real-time systems. It is based on Full Maude, the object-oriented extension of Core Maude, which is the basic version of Maude [20]. Real-Time Maude makes use of -algebraic equational specifications in a functional programming style to define data types; -labeled rewrite rules to define local and global transitions; -tick rewrite rules to advance time in the entire system state.
In this paper we do not go into the details of Real-Time Maude syntax but we focus on rewrite rules and commands for simulation and formal analysis. In this section we provide high-level information on the Core Maude data type definition and Full Maude classes, subclasses and objects. Labelled rewrite rules for defining global transitions and Real-Time Maude tick rewrite rules are illustrated in Sect. 2. Commands for simulation and formal analysis are illustrated in Sect. 3.
A Core Maude data type is defined using an algebraic signature, that is, a set of declarations of sorts, subsorts, and operators. Operators are defined in an equational way as well as by using special tags to declare common properties, such as commutativity, associativity, having a specific identity and constructors. Constructors are the carriers of the sort, in the sense that they define in a unique way each member of the sort.
declares a class C with attributes att 1 , . . . , att n of sorts s 1 , . . . , s n , respectively. An object of class C in a given state is represented as a term A subclass inherits all the attributes and rules of its superclasses. The class declaration syntax is also used for subclasses and supports the definition of new attributes specific to the subclass.
Real-Time Maude specifications are organised in modules, which can be imported by other modules and, as we will see in Sect. 3, be referred by the commands for simulation and formal analysis.

Biofilm Formation Model
We use Real-Time Maude to model a bacterial population as a multiset of objects of a class Bacterium with three attributes: state which can be either planktonic or sessile; toDuplication which is a natural number representing the remaining lifetime in minutes of a cell before duplication; c-di-GMP-internal which is a natural number representing the concentration of c-di-GMP inside a single cell and is expressed in nanomoles (nM) We also use a single object of a class Population to collect the global data of the entire population of bacteria. The class Population consists of the following attributes: state which can be one of the following values, whose meaning is illustrated in Fig. 2: creation, next-time, ready-to-reproduce, ready-to-die and attributes-updated. size which is a natural number representing the number of cells of the population; planktonic-cells which is a natural number representing the number of cells of the population that are in planktonic state; sessile-cells which is a natural number representing the number of cells of the population that are in sessile state; c-di-GMP-global which is a natural number representing the global concentration of c-di-GMP for the entire population and is expressed in nanomoles (nM) with respect to the global volume of the cells; biofilm which is a boolean stating whether the biofilm is formed or not.
We preferred to keep the multiset of Bacterium objects 'outside' the population data, rather than incorporating it as a field of the Population object. This choice aims at considering the 'global population' and the 'multiset of bacteria' as a sort of pair of 'interacting agents' whereby, in a prospective extension of our approach, global data on the population behaviour may actually effect the development of the multiset of agents. In fact, it is easier in biology to have availability of data on the behaviour of the population as a whole rather than on the behaviour of the single cells that make up the population. Therefore, the system that describes the bacterial ecosystem consists of one object of class Population and a multiset of objects of class Bacterium. Figure 2 illustrates the initial creation of the bacterial population and the evolution of the population. Evolution consists of TIME-GRANULARITY unit cycles. During each cycle the one object of class Population goes through the different states that update the attributes of all objects (bacteria and population) and terminate the cycle with a TIME-GRANULARITY unit time increment.
With reference to Fig. 2, in Sect. 2.2, 2.3, 2.4, 2.5 and 2.6 we illustrate the rewrite rules that drive the population initial creation and its evolution. Section 2.7 illustrate how our model may be extended to include intervention preventing biofilm formation. Section 2.1 set the context for the next sections by briefly explaining the notion of system state and overviewing the syntax and semantics of rewrite rules.

Real-Time Maude Configuration and Rewrite Rules
The global system state is a term of pre-defined sort Configuration and is a multiset of objects. Multiset union is denoted by an associative and commutative juxtaposition operator, so that rewriting is multiset rewriting. Transitions between global system states are defined using rewrite rules. Maude

Creation of the Initial Population
In order to start with an initial population of planktonic cells uniformly distributed in term of cell age, we consider a constant value INITIAL-POPULATION for the initial number of bacteria and use the following two rewrite rules: initial-population-creation which cyclically assigns an age between 1 and the age at duplication (i.e., the duration of the cell life cycle) for the specific species to the newly created cell; initial-population which stops the creation process by changing the population state from create to next-time, when the bacterial population reaches the value INITIAL-POPULATION, thus disabling the previous rule.
The first rule also initialises the concentration of c-di-GMP internal to the cell. This concentration depends on the age of the cell and on food availability.

Biofilm Formation
The transition of the object of class Population from state next-time to state ready-to-reproduce updates the information about the biofilm formation and the changes of state of the single cells (from planktonic to sessile and vice versa). We consider two threshold values for the concentration of c-di-GMP: C-DI-GMP-THRESHOLD-BIOFIM and C-DI-GMP-THRESHOLD-SESSILE.
The constant C-DI-GMP-THRESHOLD-BIOFIM represents the global concentration of c-di-GMP that triggers the transition of the bacterial colony to a biofilm, when the threshold is reached or exceeded, and the transition back to a dispersed colony, when the concentration drops below the threshold. The transition is recorded in the boolean attribute biofilm of the object of class Population.
The constant C-DI-GMP-THRESHOLD-SESSILE represents the internal concentration of c-di-GMP of a single cell that triggers the transition of the state of that cell to sessile, when the threshold is reached or exceeded, and the transition back to planktonic, when the concentration drops below the threshold. Note that this rewrite rule is always enabled in the state next-time in order to update the state of each single cell after the concentration of c-di-GMP internal to the cell has been updated during the last time increment. The operator changeState recursively changes the state of the entire multiset of bacteria by comparing the concentration of c-di-GMP internal to the cell with the threshold C-DI-GMP-THRESHOLD-SESSILE. The operator countCells recursively counts the number planktonic and sessile cells of the NEW-BACTERIA multiset, which is the bacterial colony after updating the states of the single cells.

Cell Reproduction
Cell reproduction may only occur when the bacterial population does not exceed a given REPRODUCTION-THRESHOLD threshold. The reproduction process is modeled by the operator mitoses which, for each cell that has reached the reproduction age, i.e., whose toDuplication attribute equals 0, 1. resets the attribute toDuplication of that cell to a given LIFE-DURATION value, which represents the duration of the life cycle of the cell; 2. creates a new cell identical to the previous one apart from a 0 value for the concentration of c-di-GMP internal to the cell; 3. uses the size of the population, passed as the second argument, to define the object identifiers of the new cell by recursively incrementing such argument.
We have chosen to leave the entire concentration of c-di-GMP within one of the two cells. This is consistent with recent literature, which has shown that asymmetric division results in a better colonization of the surface [8], with the formation of multiple microcolonies [14].
The rewrite rule reproduction models the cell reproduction process and its impact on the attributes of the Population object: All attributes of the object of the class Population are updated to take into account the newly created cells. In particular, the global concentration of c-di-GMP is updated by using the operator count-c-di-GMP, which sums up the concentrations of c-di-GMP internal to all the single cells in multiset NEW-BACTERIA (after the reproduction has occurred).

Cell Death
When the bacterial population exceeds the given REPRODUCTION-THRESHOLD threshold, cell reproduction can non longer occur. Instead, cells start dying at a rate that has been estimated around 5% for each life cycle [2]. The rewrite rule death models such a form of death: As shown in Fig. 2, before applying the rewrite rule death, we need to use the rewrite rule rename to rename all objects of the class Bacteria. This purely technical manipulation, which has no biological meaning, allows us to reuse the object names and prevents the number of system states to grow arbitrarily. Since, the balance between reproduction and death keeps the population below some size threshold, the behaviour of the model has a finite (though very large) number of states and can, in principle, be analysed using model checking with no time limitations. However, in practice, we normally introduce a time upper bound when using model checking to avoid the state explosion problem. The rewrite rule death is enabled by the population state ready-to-die which, as shown in Fig. 2, is changed from ready-to-reproduce by the rewrite rule rename, when the attribute size of the Population object exceeds the threshold REPRODUCTION-THRESHOLD. The rewrite rule death uses the operator starvation to remove a DEATH-RATE per cent of the population over a duration corresponding to the cell life cycle (i.e., LIFE-DURATION).
Let TIME-GRANULARITY be the number of units time advances at each increment. The number M of cells to remove during one time unit is calculated by considering the number of cells N * DEATH-RATE quo 100 to be removed during an entire life cycle and dividing it by LIFE-DURATION. This value has to be multiplied by TIME-GRANULARITY to get the number of cells to be removed at each time increment. Note that the terms of the expression have been rearranged in the rewrite rule condition to avoid the inclusion of term N * DEATH-RATE quo 100 quo LIFE-DURATION, which may become 0 for small values of N even for a coarse time granularity (values of TIME-GRANULARITY greater than 1).
In addition to removing M cells from the multiset BACTERIA independently of their ages, the reduction of expression starvation(BACTERIA, M, 0) recursively increments its third argument by 1 and uses it to rename cells that do not have to be removed from the multiset. In this way, the resultant multiset of bacteria will be named with all natural numbers between 1 and the size of the bacterial population. Furthermore, the operator starvation resets the toDuplication attribute of each bacteria that is not removed to LIFE-DURATION. This ensures that the uniform distribution of the cell ages is preserved.

Time Increment
Both rewrite rules reproduction and death change the state of the object of class Population to attributes-updated. This is the final state for the current time interval and features the updated values for all attributes of the objects of the system configuration.
In the state attributes-updated time is incremented discretely as intervals of TIME-GRANULARITY units by the following tick rewrite rule: The operator idle not only advances time for all cells but also calculates the internal concentration of c-di-GMP, depending on the age of the cell after the the increment, and changes the Population state to next-time to be ready for updating all the object attributes during the new time interval.

Intervention to Prevent Biofilm Formation
In this section we show how to model a simple form of intervention to prevent the formation of a biofilm on a specific surface: the replacement of the material infrastructure that comprises such a surface. To this purpose we consider a constant time TIME-BETWEEN-INTERVENTIONS, we extend the class Population by defining a subclass Intervention which has the additional following attribute: next-intervention which is the possibly infinite time before the next intervention; and we replace the tick rewrite rule introduced in Sect. 2.6 with the following tick-without-intervention tick rewrite rule: When the next-intervention attribute of the Intervention object has become as low as 0, the following tick-with-intervention rewrite rule reset the system to the initial population: Note that when TIME-BETWEEN-INTERVENTIONS is set to infinite value INF, rule tick-with-intervention can never be applied.

In Silico Experiments and Formal Analysis
Maude modules are executable and the Real-Time Maude toolset provides a variety of formal analysis methods.
The timed rewriting command simulates one of the system behaviours of module m by rewriting the initial state s 0 using up to r term rewrites and up to duration t. The timed rewriting command provides us with an important tool to perform in silico experiments, by simulating experiments that would last for hours within a few seconds. Although only one of the possible system behaviour of a nondeterministic system is shown during simulation, this may provide important information on the biological system evolution.
The timed search command supports the analysis of all possible behaviours from a given initial state, relative to the chosen time sampling strategy. The command performs reachability analysis by searching for the states that match a search pattern and are reachable in a given time interval (if indicated). There are several variants for the syntax of the search command, such as: where s is the search pattern, n is the number of solutions searched for and t is the time at which to stop the search. Section 3.1 introduces the case study that we use to illustrate our approach. Section 3.2 illustrates the use of timed rewriting to perform in silico experiments and Sect. 3.3 illustrates the use of timed search to perform formal analysis.

Pseudomonas Aeruginosa
Pseudomonas aeruginosa is a Gram-negative microorganism, an opportunistic pathogen, actually one of the most important pulmonary pathogens and the predominant cause of morbidity and mortality in cystic fibrosis. P. aeruginosa forms rapidly biofilms on plastic surfaces, agar medium, graphite, metals and other materials. The composition of its extracellular polymeric substance has been studied in details and high quality data are available on the c-di-GMP distribution in cells, division upon replication and concentration with time measured through fluorescence in recombinant laboratory strains.
The life cycle of P. aeruginosa has a duration of 120 min [31] and the concentration of c-di-GMP internal to a cell in relation to the cell age is given in Table 1 [1].

Simulation
In order to illustrate our approach we consider the simulation reported in Table 2.
We have defined Real-Time Maude module P-AERUGINOSA, which models a simplified bacterial ecosystem with values of the parameters taken from the literature on P. areruginosa [1,25,31]. The model is defined without intervention. The results in Table 2 have been produced using the command with t representing the considered time in minutes (second column of Table 2).
The duration of the simulation is 12 h. We use an initial population of 120 cells in order to have a uniform distribution of the cells within the 120 min of cell life cycle. We adopt a 10 min time granularity to be able to observe cell death at each time increment, when the population size is above the REPRODUCTION-THRESHOLD threshold. In fact, due to the small population size we considered, using a finer granularity would result in a 0% death rate for each time increment. The small population size has been chosen due to the illustrative purpose of the simulation and to the need to have a computational response time of the order of seconds.
The chosen 10 min time granularity is a sufficient time step to model the biofilm formation phenomenon and is within the range adopted in previous studies. In general the time step is chosen depending on the bacterial species considered and on the phenomenon under investigation (e.g., biofilm initial attachment, biofilm detachment, viscoelastic modification of the matrix, etc.). It is commonly accepted that the time step for microbial phenomena is much higher than for hydrodynamic phenomena. The time step used in previous studies range from five minutes in P. aeruginosa biofilm growth [7] to one hour for oral biofilm formation [15] and hours in biofilm formation and bacteria decay [26].
The simulation in Table 2 shows a growth of the population size and of the global concentration of c-di-GMP due to cell reproduction, with a biofilm formation at time 2:50, when the c-di-GMP global concentration threshold is reached. The growth of both values then continues until time 3:00, when the population  [1,25,31]. However, we can also consider a single cell as a representative of a cluster of cells that, globally, exhibits a homogeneous behaviour. This approach is commonly used and considered realistic [22,27].

Formal Analysis Using Model Checking
The search command can be used to predict the biofilm formation in an in silico experiments. With reference to the experiment considered in Sect. 3.2, we can use the search command (tsearch [3] in P-AERUGINOSA init =>* {C:Configuration < P-aeruginosa : Population | state : attributes-updated, biofilm : true >} with no time limit .) to find the solutions for times 170, 180 and 190 in Table 2. Note that we must include the state attributes-updated in the search pattern to make sure that the solution has all attributes updated for the specific time unit.
Moreover, the results of the experiment shown in Table 2 seem to suggest that the biofilm, once formed, will persist forever. Such a conjecture cannot be validated through an experiment, since we do not know for how long we need to continue the experiment to possibly observe the biofilm disappearing. However, using the search command we find out that after 860 min (14 h and 20 min) the biofilm disappears and the dispersed population consists of 385 cells (357 planktonic and 28 sessile) with 38 670 as a global c-di-GMP concentration. This proves that the conjecture suggested by the experiment results was actually false.
We have also defined module P-AERUGINOSA-INTERVENTION, which extends module P-AERUGINOSA by modelling a replacement intervention to be repeated every 160 min, namely TIME-BETWEEN-INTERVENTIONS equals 160. Then the search command (tsearch [1] in P-AERUGINOSA-INTERVENTION init =>* {C:Configuration < P-aeruginosa : Intervention | state : attributes-updated, biofilm : true >} in time <= 3000 .) will return 'No Solution' thus predicting that no biofilm would be formed even if we could extend the duration of the experiment to 3 000 min (50 h).

Conclusion and Future Work
We have used Real-Time Maude to define an approach for the formal modelling and analysis of early biofilm formation. Our approach supports both simulation to mimic lab experiments and formal analysis by exploiting Real-Time Maude timed search command to predict 1. a specific outcome of an experiment; 2. the absence of a given outcome from all possible results of an experiments. Prediction 2 is essential for planning effective interventions to prevent the formation of a biofilm. Although the kind intervention considered in this paper is a simple replacement, as part of our future work we plan to consider more sophisticated forms of intervention, such as the use of antibacterial and quenching.
Quenching refers to the capture/deactivation of c-di-GMP and other messengers or signalling molecules involved in the process of biofilm formation and maturation. Recent examples of c-di-GMP quenching agents include the aromatic compound coumarin [32], the fungal-produced antimicrobial terrein [13], the immunosuppressive drug azathioprine [5] and the antimetabolite drug sulfathiazole [4]. These and other c-di-GMP quenching compounds can be used to reduce biofilm formation without eliciting antimicrobial resistance in the target microorganisms.
We also plan to extend our model by including more accurate biological mechanisms and by improving the precision of the generated behaviour. For example, in terms of biological mechanisms, we have neglected the cell grows through time and normalised the c-di-GMP concentration with respect to the growing volume of the cell. Separating the two mechanisms, cell growth and c-di-GMP concentration, would result in a more accurate model. For the case study of P. aeruginosa, detailed microbial growth parameters are reported by Schleheck et al. [25]: duration of the log growth phase, stationary phase, and death phase. The precision of the generated behaviour could be improved by using the Population class to record further global aspects in order to support the mapping of a global behaviour observed in lab experiments to the behaviour of individual cells. For example, we could record the death toll over a period of time that equals the life cycle and make adjustments during the rewrite process to keep it within two given thresholds in accordance with the given birth rate. This would allow us to avoid the problem discussed in Sect. 2.5 and make the death rewrite rule work with higher precision and with no constraints on the size of the population and time granularity. Finally we plan to consider a time interval rather than an exact time for cell life-cycle duration and, at a later stage, introduce stochasticity into our model.