Introduction

Numerous Enterobacter cloacae strains have been associated with plants as agents of disease [14], but E. cloacae strains have also been associated with plants as endophytes [58], used for biocontrol of fungal pathogens [916], and associated with nosocomial infections in hospital settings [1719]. E. cloacae is in the E. cloacae complex, which also includes the Enterobacter species of E. asburiae, E. hormaechei, E. kobei, E. ludwigii, and E. nimipressuralis. While 16S rRNA sequences are used to initially identify E. cloacae strains, the sequence is not always sufficient for identification at the species and sub-species level [17]. Previous phylogenetic studies with multi-locus sequence analyses of common housekeeping genes demonstrate that there is considerable diversity among the strains designated as E. cloacae due to the formation of multiple clades and the fact that only 3% of the strains group with the type strain E. cloacae subsp. cloacae ATCC 13047 [17,18]. The number of draft and complete E. cloacae genomes has increased recently and there are currently five complete and five draft E. cloacae genomes, with additional registered genome projects [20]. Sequencing and analysis of more E. cloacae genomes may establish a basis for explaining the diversity within the E. cloacae complex and provide new means for more definitive species or sub-species designation.

Classification and features

E. cloacae P101 was isolated from switchgrass (Panicum virgatum) growing on Buena Vista Quarry Prairie near Plover, Wisconsin and is a Gram-negative, rod shaped bacterium of the family Enterobacteriaceae (Table 1). The species within the genus Enterobacter are difficult to identify with biochemical and phylogenetic tests [18], but the increasing number of complete genomes is providing clues as to the relationships among the species. E. cloacae species group separately from other Enterobacter species in a phylogenetic tree using 16s rRNA sequences (Figure 1) with strong support (posterior probability of 100%). In this analysis, P101 is most closely related to E. cloacae EcWSU1 and E. cloacae ENHKU01 which are two other E. cloacae strains that have been isolated from plants. E. cloacae EcWSU1 causes Enterobacter bulb decay on stored onions (Allium cepa) [41] and E. cloacae ENHKU01 was isolated as an endophyte from a pepper (Capsicum annuum) plant infected with Ralstonia solanacearum [42].

Figure 1.
figure 1

Phylogenetic tree of 16S rRNA sequences from Enterobacter sp. with genome sequences. E. cloacae strains grouped separately into a clade from other Enterobacter species using Bayesian phylogenetic analyses of the 16S rRNA region. Analyses were implemented in MRBAYES [39] and the Bayesian Information Criterion (BIC), DT-ModSel [40] was used to determine the nucleotide substitution model best suited for the dataset. To ensure that the average split frequency between runs was less than 1%, the Markov chain Monte Carlo search included two runs with four chains each for 10,000,000 generations. Pectobacterium carotovorum served as the outgroup for the analysis. Numbers in parentheses behind the bacterial names correspond to the GenBank accession numbers for the genome sequences. The scale bar indicates the number of substitutions/site.

Table 1. Classification and general features of Enterobacter cloacae P101 according to MIGS recommendations [21]

Genome sequencing and annotation

Genome project history

The E. cloacae P101 genome project was initiated as part of an undergraduate class at the University of Florida [36]. For the class, whole-genome sequence was obtained using a Genome Sequencer 20 (454 Life Sciences, Branford, CT) and the students used PCR and sequencing to resolve some gaps. Although the project began with these data, little progress was made towards closing the genome. As a result, new next-generation DNA sequencing data for P101 was obtained at the Laboratory for Biotechnology and Bioanalysis at Washington State University using the PacBio RS platform and the PCR products generated to confirm the genome assembly were sequenced at Elim Biopharmaceuticals (Hayward, CA). A BglII cut optical map of P101 was obtained from OpGen (Gaithersburg, MD) in 2009 and was also used in the genome assembly process. The complete chromosome sequence has been deposited in GenBank under the accession number CP006580. Table 2 summarizes the P101 sequencing project.

Table 2. P101 Genome sequencing project information

Growth conditions and DNA isolation

E. cloacae P101 was cultured overnight in LB broth [45] on a rotary shaker at 200 rpm at 28°C. To remove excess exopolysaccharides prior to genomic DNA isolation, the cells were washed twice with equal volumes of sterile, distilled water. Genomic DNA was then isolated from the washed cells using a Wizard Genomic DNA Purification Kit (Promega, Madison, WI) following the kit protocol for Gram-negative bacteria.

Genome sequencing and assembly

Genome sequencing was performed at the Laboratory for Biotechnology and Bioanalysis at Washington State University on a PacBio RS instrument (Pacific Biosciences, Menlo Park, CA). A small insert library for circular consensus reads was prepared from 5 µg of P101 genomic DNA. The genomic DNA was first fragmented to 1 Kb pieces using 20 shearing cycles at speed code 6 through the small shearing assembly of a Hydroshear Plus (Digilab, Marlborough, MA). The library was then constructed using the DNA Template Prep Kit 2.0 (250 bp-<3 kb) (Pacific Biosciences, Menlo Park, CA). Two large insert (10 Kb) libraries for continuous long reads (CLR) were also prepared. For one library, 10 µg of genomic DNA was sheared using 20 shearing cycles at speed code 11 through the large shearing assembly of a Hydroshear Plus. The second library was prepared with 5 µg of genomic DNA that was fragmented by passing the DNA twice through a g-TUBE (Covaris, Woburn, MA) at 6,000 × g in a microcentrifuge. Both large libraries were prepared using DNA Template Prep Kit 2.0 (3–10 Kb) (Pacific Biosciences). The resulting libraries were bound to the C2 DNA polymerase (Pacific Biosciences) and loaded into the SMRT cell (Pacific Biosciences) zero mode waveguides by diffusion (small libraries and first large library) or with mag-bead assistance (second large library). The prepared libraries were loaded on a total of 16 SMRT cells. The four SMRT cells that contained the small insert libraries were observed with two 55 minute movies while the 12 SMRT cells with large libraries were observed with a single 120 minute movie. Pre-filtering, there was 1.5 Gbp of data in 1.2 million reads with an average read length of 1,244 bp and read quality of 0.284. After filtering to remove any reads shorter than 100 bp or below the minimum accuracy of 0.8, 0.96 Gbp of data remained and consisted of 287,709 reads with an average quality of 0.857 and an average read length of 3,323 bp.

The raw data from the 16 SMRT cells were assembled using the HGAP protocol of the SMRT Analysis v2.0.0 software (Pacific Biosciences). The standard bacterial HGAP assembly protocol with an expected genome size of 5.0 Mb was used. The same protocol was also used to assemble the data from 12 SMRT cells, which excluded four CLR SMRT cells run under instrument software v1.3.0, due to concerns of artifacts in the assembly based on how the quality scores were handled by that version of the software. The 20 contigs from the 16 SMRT cell assembly were used as the base set of contigs. The largest contig was 1.7 Mbp in length and the average coverage for all the contigs was 131× with an N50 of 591,864 bp. The 12 SMRT cell contig set was essentially the same, but there were 28 contigs with an N50 of 3,479,841 bp (also the length of the longest contig). The contigs were mapped to the P101 optical map. This allowed the contigs to be ordered and for overlapping regions to be joined together. Primer pairs for regions throughout the genome assembly were generated and used to verify the assembly using GoTaq Polymerase (Promega) according to the manufacturer’s protocol and 50 ng of P101 genomic DNA, which had an annealing temperature of 52°C and an extension of 1 m. Sequencing was completed for both strands of the PCR amplicons using the same primers used for amplification of the fragments. The assembled chromosome and sequences from the PCR products were aligned with Bioedit (Ibis Biosciences, Carlsbad, CA).

Genome annotation

The submission file for GenBank was prepared using Sequin [46]. The genome sequence was submitted to GenBank and annotated with the NCBI Prokaryotic Genome Annotation Pipeline [44].

Genome properties

The genome of E. cloacae P101 has one circular chromosome of 5,369,929 bp (Table 3). The average G+C content for the genome is 54.4% (Table 3). There are 100 tRNA genes and 8 rRNA operons, each consisting of a 16S, 23S, and 5S rRNA gene. There are 5,164 predicted protein-coding regions and 29 pseudogenes in the genome. A total of 4,419 genes (83.6%) have been assigned a predicted function while the remainders have been designated as hypothetical proteins (Table 3). The numbers of genes assigned to each COG functional category are listed in Table 4. Of the annotated genes, 19.6% were not assigned to a COG or are of unknown function.

Table 3. P101 Genome Statistics
Table 4. Number of genes associated with the general COG functional categories