Background

Escherichia coli is a model bacterium and a key organism for laboratory and industrial applications. E. coli strain C was isolated at the Lister Institute and deposited into the National Collection of Type Cultures, London, in 1920 (Strain No. 122). It was characterized as more spherical than other E. coli strains and its nuclear matter was shown to be peripherally distributed in the cell [1]. E. coli C, called a restrictionless strain, is permissive for most coliphages and has been used for such studies since the early 1950’s [2]. Genetic tests showed that E. coli C forms an O rough R1-type lipopolysaccharide (LPS), which serves as a receptor for bacteriophages [3]. Its genetic map, which shows similarities to E. coli K12, was constructed in 1970 [4]. It is the only E. coli strain that can utilize the pentitol sugars, ribitol and D-arabitol, and the genes responsible for those processes were acquired by horizontal gene transfer [5]. Some research on genes involved in biofilm formation in this strain has been attempted but hasn’t been continued (Federica Briani, Università degli Studi di Milano, personal communication) [6].

Biofilm is the most prevalent form of bacterial life in the natural environment [7,8,9,10,11]. However, in laboratory settings, for decades, bacteria have been grown in liquid media in shaking, highly aerated conditions, which select for the planktonic lifestyle. While all laboratory strains of E. coli, such as K12, B, W, and Crooks, are poor biofilm formers, environmental isolates usually form robust biofilms. These E. coli strains can cause diarrhea and kidney failure, while others cause urinary tract infections, chronic sinusitis, respiratory illness and pneumonia, and other illnesses [12,13,14,15]. Many of these symptoms are correlated with biofilms. A few E. coli K12 mutant strains have been described as good biofilm formers, such as the csrA mutant or AJW678 [16, 17], but one can claim that these mutants cannot occur in natural conditions. Therefore, it is important to find a safe laboratory strain that can serve as a model for biofilm studies.

We found that the E. coli C strain forms a robust biofilm under laboratory conditions. The complete genome sequence of this strain was determined and bioinformatics analyses revealed the molecular foundations underlying this phenotype. A combination of experimental and in silico analysis methods allowed us to unravel the two major mechanisms that draw the biofilm formation in this strain.

Results

Biofilm formation

In our search for a good model biofilm strain, we screened our laboratory collection of E. coli strains using the standard 96-well plate assay [18] and the glass slide assay [19] (Fig. 1a). We found that the E. coli C strain formed robust biofilms on both microscope slides and in 96-well plates. In minimal M9 with glycerol medium, the strain C produced 1.5- to 3-fold more biofilm than the other laboratory strains; and in Luria-Bertani (LB) rich medium, the strain C biofilm formation was as much as 7.4-fold higher (Fig. 1b).

Fig. 1
figure 1

Biofilm formation by E. coli strains on (a) microscope slides (LB medium) and (b) 96-well plates (LB and M9 with glycerol)

During overnight growth in LB medium at 30 °C shaken at 250 rpm, we noticed an increased aggregation of bacterial cells in the E. coli C culture (Fig. 2a). The ratio of planktonic cells to total cells in the culture was 0.35 compared to 0.83 and 0.85 for Crooks and B and 0.98 or almost 1 for K12 and W, respectively (Fig. 2b).

Fig. 2
figure 2

Cell aggregation in overnight culture grown at 30 °C in LB Miller broth on shaker at 250 rpm. a From left: E. coli C, E. coli Crooks, E. coli B, E. coli K12, and E. coli W. b Ratio of planktonic cells to total cells measured as OD600. c Microscopic picture of the E. coli C precipitate

Aggregation at low temperature depends on salt concentration

Previously, we have described a regulatory loop affecting biofilm formation in a high salt/high pH environment. This loop involved the nhaR, sdiA, uvrY, and hns genes, as well as the csrABCD system [20]. We were interested if the aggregation of E. coli C depends on NaCl concentration. We grew the bacteria in three LB broth media containing different amounts of salts: Miller broth (1% NaCl), Lennox broth (0.5% NaCl), and a modified Lennox broth with 0.75% NaCl. After overnight growth at 30 °C in culture tubes shaken at 250 rpm, we observed a lack of aggregation in standard Lennox medium, while in the modified Lennox medium and Miller broth the ratio of planktonic to total cells was similar (Fig. 3). The ratio of planktonic/total cells in the Lennox medium was statistically different (p < 0.00001, One-Way ANOVA test) from that in media with a higher NaCl concentration and similar to other strains grown in LB Miller broth (Fig. 2b).

Fig. 3
figure 3

Effect of salt concentration on E. coli C aggregation at 30 °C

E. coli C genomic sequence

The genomes of E. coli K12, E. coli B, E. coli W, and E. coli Crooks (GenBank:CP000946) have already been sequenced [21,22,23]. To compare the genomic sequences of all five laboratory strains, we sequenced the E. coli C genome. The chromosome consisted of 4,617,024 bp and encoded 4581 CDSs (Fig. 4). No extrachromosomal DNA was detected. The mean G + C content was 51%. We identified 7 rRNA operons, 89 tRNA genes, and 12 ncRNAs (total 121 RNA genes CP020543.1). The only methylation signal in that genome was Dam methylation. We found that 38,387 out of 38,406 (99.95%) of the GATC motifs had evidence of m6A.

Fig. 4
figure 4

Circular map of the E. coli C chromosome (position in bp). The inner circles show GC skew and G + C content. The third circle shows rRNA (blue) and CRISPR (red) clusters. The fourth circle shows hypothetical ORFs (green). Light blue circles represent ORFs on plus and minus strands

Comparison with other laboratory E. coli strains showed a high degree of synteny except for an inverted 300 kb region between 107 and 407 kb (Fig. 5). That inverted region showed also an inverted GC skew in comparizon to the flanking regions, indicating a recent inversion event or an assembly error (Fig. 4). To prove that the inversion represented an actual event, we used an optical mapping method [24]. The order of obtained fluorescently labelled fragments was identical with the in silico constructed map of the E. coli C chromosome (Fig. 6a), indicating the authenticity of the inversion.

Fig. 5
figure 5

Genome alignment of five E. coli strains using Mauve. Each chromosome has been laid out horizontally and homologous blocks in each genome are shown as identically colored regions linked across genomes. The inverted region in E. coli C is shifted below the genome’s center axis. From the top: E. coli C, K12, Crooks, W, and B

Fig. 6
figure 6

Optical mapping of E. coli C chromosome and comparison to K12 strain. a In silico generated map (blue) and optical map (yellow/green) of E. coli C. b In silico generated map (blue) and optical map (yellow/green) of E. coli K12

A similar comparison of E. coli K12 maps confirmed the stringency and precision of the optical mapping results (Fig. 6b). Comparison between the two optical maps confirmed that the PacBio-predicted inversion of the 300-kb DNA fragment was indeed a real event (Fig. 6b).

Genetic content

A maximal likelihood tree showed that E. coli C was most similar to the K12 strain (Additional file 1: Figure S1). A comparison of chromosomal protein-coding orthologs among the laboratory strains showed that, out of the 5686 predicted CDSs, 3603 were shared among all five strains. Only 37 genes were present in all four of the other lab strains that were absent in E. coli C (Additional file 11: Table S1) (Fig. 7). Out of 177 genes that were unique to E. coli C, 108 encoded transposases or unknown proteins and 69 CDSs showed homology to known proteins (Table 1).

Fig. 7
figure 7

Comparison of orthologous CDSs among C, W, K-12, B, and Crooks strains. The number of shared genes, the number (log10) of unique genes, and the genes shared between one, two, three, and four strains are shown. Graph was generated with the UpSet software [25]

Table 1 Sixty-nine unique genes in E. coli C genome

Genes involved in biofilm formation

Several genes have been ascribed active roles in biofilm formation in E. coli [26,27,28]. One of the most important is the flu gene encoding the antigen 43 protein [29]. In liquid culture, Ag43 leads to autoaggregation and clump formation rapidly followed by bacterial sedimentation. Surprisingly, the flu gene was not present in the E. coli C genome. We identified a few autotransporter encoding genes, which showed partial homology to Ag43, such as B6N50_05815 (50% similarity over 381aa); however, the homology was too weak to suggest that these genes could play a similar role.

Surface polysaccharides often play an important role in biofilm formation [27, 30]. E. coli C forms an O rough R1-type lipopolysaccharide, which serves as a receptor for bacteriophages [3]. Out of the 14 waa genes present in E. coli K12, we were able to find only 6 in E. coli C. Out of the 5 genes waaA, waaC, waaQ, waaP, and waaY, which are highly conserved and responsible for assembly and phosphorylation of the inner-core region [31], only the first 4 were present in E. coli C (Additional file 2: Figure S2). Two remaining genes in E. coli C were waaG, whose product is an α-glucosyltransferase that adds the first residue (HexI) of the outer core, and waaF, which encodes for a HepII transferase [31]. Biofilm formation by a deep rough LPS hldE mutant of E. coli BW25113 strain was strongly enhanced in comparison with the parental strain and other LPS deficient mutants. The hldE strain also showed a phenotype of increased autoaggregation and stronger cell surface hydrophobicity compared to the wild-type [32]. The gene hldE, which encodes for a HepI transferase, was found in the E. coli C strain. Other mutants in LPS core biosynthesis, which resulted in a deep rough LPS, have been described to decrease adhesion toabiotic surfaces [33]; therefore, we assumed that other genes in this family would not be responsible for the increased biofilm formation by E. coli C.

We noticed also that wzzB, a regulator of length of O-antigen component of LPS chains was mutated by an IS3 insertion. Another IS insertion was located in UDP-glucose 6-dehydrogenase (B6N50_08940). Both of these genes were located at the end of a long 35 operon-like gene stretch in E. coli C, including wca operon [34] consisting of 19 genes involved in colanic acid synthesis.

We found that the region involved in biosynthesis of poly-β-1,6-N-acetyl-glucosamine (PGA) was almost 100% identical in both K12 and C strains.

Other types of structures involved in biofilm formation are fimbriae, curli, and conjugative pili [26, 27, 35]. Type 1 pili can adhere to a variety of receptors on eukaryotic cell surfaces. They are well-documented virulence factors in pathogenic E. coli and are critical for biofilm formation on abiotic surfaces [36,37,38,39,40]. Type 1 pili are encoded by a contiguous DNA segment, labeled the fim operon, which contains 9 genes necessary for their synthesis, assembly, and regulation [41, 42]. In E. coli C, almost the entire fim operon except the fimH, which codes for the mannose-specific adhesin located at the tip of the pilus, was absent and replaced by a type II group integron (Additional file 3: Figure S3). The entire fim operon is driven by a single promoter located upstream of the fimA gene; therefore, it is possible that the fimH gene is not expressed in E. coli C. Although we cannot exclude the role of FimH in autoaggregation of E. coli C, reports that the function of FimH was inhibited by growth at temperatures at or below 30 °C [43] make it highly unlikely.

Chaperone-usher (CU) fimbriae are adhesive surface organelles typical to many gram-negative bacteria. E. coli genomes contain a large array of characterized and putative CU fimbrial operons [44]. Korea at al. characterized the ycb, ybg, yfc, yad, yra, sfm, and yeh operons of E. coli K-12, which display sequence and organizational similarities to type 1 fimbriae exported by the CU pathway [45]. They showed that, although these CU operons were not well expressed under laboratory conditions, 6 of them were nevertheless functional when expressed and promote attachment to abiotic and/or epithelial cell surfaces [45]. A total of 10 CU operons have been identified in E. coli K12 MG1655 [44]. We identified all 10 CU operons in the E. coli C genome. Furthermore, we found that the IS5 insertion in the K12 yhcE gene was not present in E. coli C (Additional file 4: Figure S4A). We also noticed that two insertion sequences were inserted in the yad region (Additional file 4: Figure S4B).

Curli are another proteinaceous extracellular fiber involved in surface and cell-cell contacts that promote community behavior and host cell colonization [46]. Curli synthesis and transport are controlled by two operons, csgBAC and csgDEFG. The csgBA operon encodes the major structural subunit CsgA and the nucleator protein CsgB [47]. CsgC plays a role in the extracellular assembly of CsgA. In the absence of CsgB, curli are not assembled and the CsgA - main subunit protein, remains unpolymerized when secreted from the cell [46]. The csgDEFG operon encodes 4 accessory proteins involved in assembly of curli. The csgBA operon is positively regulated by transcriptional regulator CsgD [47]. We found that the intergenic region between csgBA and csgDEFG has been modified in E. coli C. An IS5/IS1182 family transposase was inserted between 106 bp upstream of the csgD gene and 96 bp inside the csgA gene (Additional file 5: Figure S5). The entire csgB gene as well as the first 32aa of CsgA have been deleted. The full CsgA protein in E. coli K12 contains 151aa while the truncated version in strain C consisted of only 107aa and might not be expressed. Furthermore, csgD expression is driven by a promoter located ~ 130 bp upstream [48, 49]. The IS5/IS1182 family transposase inserted between that promoter and the csgD gene was transcribed in the same direction, so it might not cause a polar mutation but definitely would interfere with the sophisticated regulation of csgD expression by multiple transcription factors [48, 49]. As E. coli C did not carry any extrachromosomal DNA, conjugative pili, which usually play an important role in biofilm formation [50], were not analyzed.

Biofilm formation is a bacterial response to stressful environmental conditions [9]. This response requires an orchestra of sensors and regulators during each step of the biofilm formation process. We analyzed a few of the most important mechanisms, such as CpxAR, RcsCD, and EnvZ/OmpR [27]. In all three cases, we observed the same gene structure and a high degree of DNA sequence identity between the E. coli C and K12 strains.

Another regulatory loop includes the carbon storage regulator csrA and its small RNAs [51]. Mutations within the csrA gene induced biofilm formation in many bacteria [17, 51]. Recently, the CsrA regulation has been connected with multiple other transcription factors, including NhaR, UvrY, SdiA, RecA, LexA, Hns, and many more [20, 52]. The regulatory loop with NhaR protein drew our attention as it is responsible for integrating the stress associated with high salt/high pH and low temperature [20]. We found that the nhaAR and sdiA/uvrY regions of E. coli C were almost identical with the corresponding regions in the K12 strain. We amplified and sequenced the csrA gene from the E. coli C strain to verify its presence and integrity (Additional file 6: Figure S6). Detailed analysis of the csrA region revealed the presence of an IS3-like insertion sequence 86 bp upstream of the ATG codon (Fig. 8). The csrA gene is driven by 5 different promoters [53]. The distal (− 227 bp) promoter P1 is recognized by sigma70 and sigma32 factors and enhanced by DskA. The P2 (− 224 bp) promoter depends on sigma70. Both the P1 and P2 promoters are relatively weak promoters [53]. The P3 promoter is located 127 bp upstream of csrA and it is recognized by the stationary RpoS (sigma32) polymerase. This promoter is the strongest promoter of csrA gene. Promoters P4 and P5 are located 52 bp and 43 bp, respectively, upstream of the csrA gene. These promoters are driven by the sigma70 polymerase and are active mainly during exponential growth [53]. The IS3 insertion was located within the − 35 region of the P4 promoter. That location should almost completely abolish expression of the csrA gene in the stationary phase of bacterial growth and probably was the main reason for increased biofilm production by the E. coli C strain. Both small RNAs, csrB and csrC, which regulate CsrA activity, were found unchanged in the E. coli C genome.

Fig. 8
figure 8

Insertion of IS3-like sequence in the promoter region of csrA gene. a Structure of the IS3 like sequence; (b) genome view of K12 (upper) csrA promoter region BLAST results with E. coli C; (c) csrA promoter region [53] with the IS3 insertion site

Confirmation of IS3 insertion and its complementation by overexpression of csrA gene

First we compared the biofilm formation ability of E. coli C and the K12 csrA mutant. The 72-h-old biofilms of both strains formed on microscope slides were similar (Additional file 7: Figure S7A). The 24-h 96-well plate biofilm assay showed that at 37 °C the K12 csrA mutant formed 30% more biofilm than E. coli C (p = 0.001, Student t-test). At 30 °C strain C produced more biofilm, but the difference was not statistically significant (Additional file 7: Figure S7B), although the csrA mutant aggregated ~ 56% more efficiently than the E. coli C strain in the same conditions. To confirm the presence of the IS3 insertion in the csrA promoter region, we designed PCR primers specific for the alaS-csrA intergenic region. Amplification results confirmed the presence of IS3 in the E. coli C promoter region (Additional file 8: Figure S8).

To see if extrachromosomal expression of the CsrA protein affects the aggregation phenotype, we cloned the csrA gene downstream of a plac promoter in pBBR1MCS-5 [54], resulting in plasmid pJEK718 or downstream of the constitutive pcat (chloramphenicol) promoter in pJEK786. Plasmids were transformed into E. coli C strain and the resulting clones were grown in LB Miller broth (30 °C, 250 rpm). The results showed that the ratio of planktonic to total cells in E. coli C carrying both constructs overexpressing the csrA gene was ~ 1.8 times higher (f-ratio = 78.12363, p < 0.00001) than in the control carrying the non-recombined vector (Fig. 9). We also noticed that the control strain showed a slightly higher amount of planktonic cells than the plasmidless control (shown on Fig. 3) (0.46 vs. 0.36), although the difference was not statistically significant (p  = 0.09, Student t-test).

Fig. 9
figure 9

Complementation of E. coli C aggregation phenotype by introduction of pJEK718 and pJEK786 plasmids overexpressing the CsrA protein

Expression of csrA promoter in E. coli K12 and E. coli C

To analyze activities of the csrA promoter from E. coli C, we cloned PCR products containing sequences upstream of the csrA gene (Additional file 8: Figure S8) into a pAG136 plasmid vector carrying promoterless EGFP-YFAST reporters (pJEKd1750) [55]. The E. coli C csrA promoter was overexpressed in both strains however, the promoter activity was much stronger in the native strain than in K12 (Additional file 9: Figure S9). We notice that the highest differences (3.2 and 2.4, at 37 °C and 30 °C, respectively) occurred at the late exponential phase (~ 4.5 h and ~ 10 h) (Additional file 9: Figure S9). We noticed that the presence of an additional copy of pcsrA in a high copy number plasmid induced aggregation of E. coli C at 37 °C. The ratio of planktonic/total cells was similar (Additional file 10: Figure S10) to that obtained for the parental E. coli C strain at 30 °C (Figs. 2 and 3) (0.36 and 0.35, respectively).

The aggregation phenotype was correlated with the highest pcsrA activity at the entrance to the stationary phase (data not shown). As the aggregation might affect the measurements we decided to use a colony assay to measure the promoter activity over the long time. The LB agar plates with spots of E. coli C and K12 carrying pJEKd1751 reporter plasmids with a short half-life form of GFP [ASV] were incubated at 30 °C and 37 °C and the fluorescence activity was measured by a Typhoon 9400 Variable Mode Imager (Fig. 10). The data showed an increased pcsrA activity over the 72 h time period in both strains with much higher activity in the native E. coli C strain (Fig. 10). The highest differences between the two strains, 8.15 and 4.71, were observed at 72 h at 30 °C and 37 °C, respectively (Fig. 10). As the half-life of the GFP [ASV] is only 110 min [56], we concluded that in the K12 strain pcsrA promoter was active mostly at the stationary phase while in the E. coli C its activity was quasi constitutive, but also enhanced at the stationary phase (Fig. 10). To test that hypothesis we analyzed the spatial expression of the pcsrA promoter in 72 h old bacterial colonies using a fluorescence microscope (Fig. 11). The pictures fully supported our premises. In the E. coli C the entire colony showed an intensive fluorescence with the highest level in the center (Fig. 11a). In the K12 strain we noticed 5 discrete zones with different fluorescence activities (Fig. 11b). The edge of the colony, which should consist of the youngest, still dividing and metabolically active cells, showed the lowest, while the center of the colony with the oldest cells showed the highest fluorescence (Fig. 11b).

Fig. 10
figure 10

Activity of pcsrA promoter (pJEKd1571) in 24 h, 48 h and, 72 h old colonies of E. coli C and K12 grown at 30 °C and 37 °C on LB Miller agar plates. Data represents the mean values from 3 biological replicates each containing 3 colonies. Differences between strains at all time points and conditions were statistically significant

Fig. 11
figure 11

Microscopic picture of the 72 h old E. coli C (a) and K12 (b) colonies containing pJEKd1571 grown at 37 °C on LB Miller agar plates. Pixel intensity plots for each colony are shown below. Yellow arrows show the colony borders and distinct pcsrA expression intensities

Location of IS3-like insertions in E. coli C genome and role of ISs in biofilm gene expression

Based on the E. coli C pcsrA promoter structure in comparison to the pcsrA-K12 [53] and its transcriptional activities, we concluded that the small 80-bp region containing the P4 and P5 promoters could not be solely responsible for the csrA transcription. Insertion sequences play a huge role in bacterial genome evolution [57]. They can also insert upstream of a gene and activate its expression [58]. Out of 177 genes that were unique to E. coli C, 55 encoded transposases (Additional file 11: Table S1).

Using BLAST, we found that the IS3-like sequence present in front of the csrA gene was present in 19 other locations throughout the genome (data not shown). Analyzing these locations, we found that in 12 cases the IS3 might drive the expression of downstream located genes (Table. 2). One of the most striking observations was that the IS3-like sequence was located in front of an alternative sigma70 factor, which was not present in the K12 strain (Table. 2). Based on the pcsrA expression, we concluded that a promoter located inside the IS3 drives permanent expression of the following genes. The presence of the constitutively expressed alternative sigma70 factor in the E. coli C can drive expression of the sigma70 promoters in a growth phase independent manner. As the remaining E. coli C csrA promoters P4 and P5 are sigma70 dependent promoters [53], it might explain their strong activity along all the cell growth phases. Further studies will be conducted to prove that hypothesis.

Table 2 Genes located downstream of the IS3-like element

Discussion

E. coli is the most common bacterial research model organism. Out of the five strains used only the E. coli C genome has not been sequenced. Here, we sequenced and analyzed the E. coli C genome and revealed its specific features that lead to enhanced biofilm formation. Recently, a new E. coli strain C genome has been submitted to the GenBank database (CP029371.1). Homology search revealed that this strain was not closely related to our strain. However, the sequence homology search of GenBank available E. coli genomes revealed that two isolates, WG5 (CP024090.1) and NTCT122 (LT906474.1), showed identical csrA promoter regions. Strain WG5 is in fact an E. coli C derivative resistant to nalidixic acid [59, 60]. This E. coli C, also known as strain CN, is publicly available in the ATCC (ATCC number 700078). We found that our sequence is very similar to the WG5 sequence, although the inverted 300 kb region between 107 and 407 kb was not present in WG5. Also some of the insertion sequences were not present in the WG5 genome. These findings again revealed a role of different mobile elements in genome rearrangements and evolution. As the bacterial genome undergoes a constant evolution and adaptation [61] and bacterial mobile elements are the most common mechanism of those processes [62, 63], one may ask why in this particular strain, unlike the other laboratory strains, the selection toward planktonic cells did not take place. There is no simple answer; however, we can speculate that as this strain is used for proliferation of bacteriophages the fact that phages kill planktonic cells might reduce the selection toward free floating cells. The second hypothesis is that for bacteriophage research using the E. coli C, the ATCC recommends low-salt (0.5% NaCl) or no salt Nutrient (#139) broth medium. As we showed, the low-salt medium reduced bacterial stress and most likely reduced the level of genome rearrangements, keeping the natural properties for biofilm formation characteristic for the wild-type strains in this laboratory E. coli C strain.

Conclusions

Biofilms are the most prevalent form of bacterial life [9, 30] and as such have drawn significant attention from the scientific community over the past quarter century. However, only in 2018 did the number of biofilm related articles reach 24,000, based on a Google Scholar search. As in all other fields, biofilm research needs to develop and follow standard protocols and methods that can be used in different laboratories and give comparable results. Unfortunately, a standardized methodological approach to biofilm models has not been adopted, leading to a large disparity among testing conditions. This has made it almost impossible to compare data across multiple laboratories, leaving large gaps in the evidence [64]. In our work, we described and characterized biofilm formation in the classic laboratory strain, E. coli C [2, 65]. We have used that strain in our biofilm-related research for almost a decade and we would like to share it with the biofilm community and propose to use it as a model organism in E. coli-based biofilm-related research.

Methods

Bacterial strains and growth conditions

Bacterial strains are listed in Table 3. Strains were grown in M9 with glycerol medium or LB Miller, LB Lennox, or modified Lennox with 0.75% NaCl broth with appropriate antibiotics, kanamycin (Km-50 μg/ml), gentamycin (Gm-10 μg/ml), and chloramphenicol (Cm-30 μg/ml).

Table 3 Bacterial strains used in this work

Biofilm assays

Biofilms on microscope slide were grown as described previously [19]. For biofilm formation on a polystyrene surface, flat-bottom 96-well microtiter plates (Corning Inc.) were used [18]. E. coli overnight cultures were diluted 1:40 in fresh medium, and 150-μL aliquots were dispensed into wells. After 24 h of incubation (37 °C), cell density was measured (OD600) using a plate reader, and 30 μL of Gram Crystal Violet (Remel) was applied for staining for 1 h. Plates were washed with water and air dried, and crystal violet was solubilized with an ethanol-acetone (4:1) solution. The OD570 was determined from this solution, and the biofilm amount was calculated as the ratio of OD570 to OD600 [19].

Construction of CsrA overexpressing strain

A 277-bp DNA fragment containing the csrA gene was amplified using csrAF-aaa GAATTCGTAATACGACTCACTATAGGGTTTC csrAR –aaaGAATTCTTTGAGGGTGCGTCTCACCGATAAAG primers. This fragment was cloned directly into the EcoRI site of the pBBR1MCS-5 vector [54]. Sequence orientation was verified by DNA sequencing and the correct clone with csrA gene downstream of the plac promoter was named pJEK718. To express the csrA gene with a constitutive pcat (chloramphenicol) promoter, a PCR amplified cat gene (870 bp, catF-aaaGATCCTGGTGTCCCTGTTGATACCGGGAA; cat-R-aaa GGATCCCCCAGGCGTTTAAGGGCACCAATAAC) was cloned in the BamHI site of one of the clones that carried the csrA gene in the orientation opposite to the plac promoter in the pBBR1MCS-5 vector. Selection for Cm-resistant clones ensured the promoter activity and the correct orientation was verified by PCR with catF/csrAR primers and DNA sequencing. The correct plasmid was named pJEK786. Plasmids were introduced into the E. coli C strain by TSS transformation [66].

Confirmation of IS3 insertion and construction of GFP reporter fusions

PCR fragments containing the csrA promoter were amplified using pcsrA aaaagatctCTGATTGCAGGCGTATCTAAGG and pcsrAR aaatctagaAAAGATTAAAAGAGTCGGGTCTCTCTGTATCC primer pair from both E. coli K12 and C strains and cloned into the BglII/XbaI site of the pAG136 plasmid [55] or the SmaI site of the pPROBE-GFP [LVA] promoter probe vector [56]. All constructs were verified by DNA sequencing. Plasmids were introduced into both the E. coli K12 and C strains by a TSS transformation [66]. GFP activity (OD480–520) was measured using BioTek Synergy HT (BioTek) or Tecan InfiniteM200 Pro (Tecan) plate readers and normalized to the optical density of the culture (OD600), yielding relative fluorescence units (RFU; FL480–520/OD600). For quantification of promoter activities in late stationary phase, single colonies were inoculated into 5 mL of LB broth, vortexed and 5 μL of cell suspension was spotted on LB Miller agar plates. Plates were incubated at 30 °C or 37 °C. At the specific time points, plates were scanned with a Typhoon 9400 Variable Mode Imager using 532/526-nm excitation/emission wavelengths (GE Healthcare). Scans were analyzed using the ImageQuant TL software (GE Healthcare). Student t-test was used to compare results and check statistical significance. Fluorescence microscopy was done with a Keyence BZ-X710 All-in-One Fluorescence microscope (Keyence).

Cell aggregation experiments

E. coli strains were grown in LB Miller broth at 30 °C in shaking conditions (250 rpm). One milliliter of the culture was transferred to standard polypropylene spectrophotometer cuvettes to measure planktonic cells densities (OD600). Remaining cultures were vortexed ~ 1 min and 1 mL was aliquoted into cuvettes to measure the total cell densities (OD600). Aggregation was calculated as a ratio of planktonic to total cell density. For the aggregation experiment, overnight cultures were vortexed ~ 1 min and 1 mL was aliquoted into standard polypropylene spectrophotometer cuvettes and capped. Cuvettes were incubated statically at 12 °C, 24 °C (room temperature), and 37 °C. Cell densities were measured every hour by measuring OD600.

DNA sequencing and sequence analyses

DNA for sequencing was isolated using the Qiagen Blood and Tissue DNA Isolation Kit. Genomic DNA was mechanically sheared using a Covaris g-TUBE. The SMRTbell template preparation kit 1.0 (Pacific Biosciences, Menlo Park, CA, USA) was used according to the PacBio standard protocol (10-kb template preparation using the BluePippin size-selection system Sage Science). After SMRTbell preparation and polymerase binding, the libraries were loaded on SMRTcells via magbead loading and run on a PacBio RS II instrument (Pacific Biosciences) using a C4 chemistry. DNA sequence data were assembled by the HGAP Assembly 2 and annotated by Prokka or NCBI’s Prokaryotic Genome Automatic Annotation Pipeline (PGAAP) [67]. E. coli C, K12, B, W, and Crook genomes were analyzed by Roary, Mauve, and Geneious R11.

Optical mapping - high molecular weight DNA extraction

Cells from overnight culture were washed with PBS, resuspended in cell resuspension buffer, and embedded into low-melting-point agarose gel plugs (BioRad #170–3592, Hercules, CA, USA). Plugs were incubated with lysis buffer and proteinase K for 4 h at 50 °C. Plugs were washed, melted, and solubilized with GELase (Epicentre, Madison, WI, USA). Purified DNA was subjected to 4 h of drop-dialysis and DNA concentration was determined using Quant-iTdsDNA Assay Kit (Invitrogen/Molecular Probes, Carlsbad, CA, USA). DNA quality was assessed with pulsed-field gel electrophoresis. High molecular weight DNA was labeled according to commercial protocols with the IrysPrep Reagent Kit (Bionano Genomics). Roughly 300 ng of purified genomic DNA was nicked with 7 U of nicking endonuclease Nt.BspQI (New England Biolabs -NEB) at 37 °C for 2 h in NEB Buffer 3. Nicked DNA was labeled with a fluorescent-dUTP nucleotide analog using Taq polymerase (NEB) for 1 h at 72 °C. Nicks were repaired with Taq ligase (NEB) in the presence of dNTPs. The backbone of fluorescently labeled DNA was stained with YOYO-1 (Invitrogen). Labeled DNA molecules entered nanochannel arrays of an IrysChip (Bionano Genomics) via automated electrophoresis. Molecules were linearized in the nanochannel arrays and imaged. An in-house image detection software detected the stained DNA backbone and locations of fluorescent labels across each molecule. The set of label locations within each molecule defined the single-molecule maps. The E. coli strain C reference sequence was in silico nicked with Nt.BspQI. Raw single-molecule maps were filtered by minimum length of 150 kbp. Molecule maps were aligned to the E. coli reference map with OMBlast. OMBlast is an optical mapping alignment tool using a seed-and-extend approach and allows split-mapping [68]. Alignments were performed with the OMBlastMapper module (version 1.4a) using the following parameters: --writeunmap false --optresoutformat 2 --falselimit 8 --maxalignitem 2 --minconf 0. Molecule maps with partial alignments to regions flanking the putative insertion breakpoint coordinates were extracted from the alignment output file. Molecule maps were manually inspected for label matches in segments 5′ and 3′ to the putative inverted region and into the inversion. The non-aligned segments of these maps, which extended into the inverted region with label matches to the opposing side in a reverse fashion, were retained.

Statistical analysis

Statistical analysis was carried out in the R computing environment and in Graphpad. One-way ANOVA was calculated using an online tool (https://www.socscistatistics.com/tests/anova/default2.aspx) or R package. Relevant statistical information is included in the methods for each experiment. Error bars show standard deviation from the mean. Asterisks represent statistical significance at p < 0.05.