Introduction

Pantoea septica Brady et al. 2010 was first isolated from a human stool sample in New Jersey USA [1]. Pantoea septica strain FF5 (= CSUR P3024 = DSM 27843) was cultivated from the skin of a healthy Senegalese woman [2]. To date, the genus Pantoea consists of 22 species and 2 subspecies [3, 4] and no genome had been described for Pantoea septica when this paper was written. Pantoea species have been isolated mostly from the environment, particularly from plants, seeds and vegetables, several being phytopathogenic [5]. Some species such as P. agglomerans , P. septica and P. eucrina are also frequently isolated from humans in whom they can cause opportunistic infections [16].

We provide here a summary classification and a set of features for Pantoea septica strain FF5, together with the description of the complete genomic sequence and annotation.

Organism information

Classification and features

A skin sample was collected with a swab from a healthy Senegalese volunteer living in Dielmo (a rural village in the Guinean-Sudanian area in Senegal) in December 2012 (Table 1). This 35-year-old woman was included in a research project that was approved by the Ministry of Health of Senegal, the assembled village population and the National Ethics Committee of Senegal (CNERS, agreement numbers 09–022), as published elsewhere [7]. Strain FF5 (Table 1) was isolated by aerobic cultivation on 5 % sheep blood-enriched Columbia agar (BioMérieux, Marcy l’Etoile, France). As the 16S rRNA gene sequence cannot be used as a means of identifying Pantoea species, a comparative rpoB nucleotide sequences analysis between strain FF5 and other Pantoea species was performed. Strain FF5 exhibited a 99.7 % sequence identity with P. septica , its phylogenetically closest validly published Pantoea species (Fig. 1) [8]. This strain is motile and its cells grown on agar are Gram-negative rods (and have a mean diameter of 0.79-1.06 μm and a mean length of 1.25-2.04 μm).

Table 1 Classification and general features of Pantoea septica strain FF5 according to the MIGS recommendations [12]
Fig. 1
figure 1

Phylogenetic tree showing the position of Pantoea septica strain FF5 relative to other strains within the genus Pantoea. The rpoB sequences were aligned using MUSCLE [31], and the phylogenetic tree was inferred using the Maximum Likelihood method with Kimura 2-parameter model from MEGA software. Numbers at the nodes are percentages of bootstrap values obtained by repeating the analysis 1,000 times to generate a majority consensus tree. The scale bar represents a rate of substitution per site of 0.02

Strain FF5 was catalase-positive but oxidase-negative. Using the API 20E system (BioMérieux), positive reactions were detected for β-galactosidase, citrate, tryptophan deaminase, mannitol, inositol, rhamnose, saccharose, melibiose, arabinose and sorbitol. Negative reactions were noted for arginine dehydrolase, lysine decarboxylase, hydrogen sulfide (H2S), urease, indole and amygdalin. Using API 50 CH (BioMérieux), positive reactions were observed for glycerol, D-ribose, D-xylose, D-galactose, D-glucose, D-fructose, D-mannose, D-maltose, D-trehalose, D-lyxose and D-fucose. Negative reactions were observed for erytritol, L-xylose, D-adonitol, methyl β-D-xylopyranoside, L-sorbose, dulcitol, methyl α-D-mannopyranoside, methyl α-D-glucopyranoside, arbutine, salicin, D-cellobiose, inulin, D-melezitose, starch, potassium gluconate, glycogen and 5-keto-D-gluconate. Using API ZYM, positive reactions were observed for alkaline phosphatase, esterase (C4), esterase lipase (C8), leucine arylamidase, acid phosphatase, naphthol-AS-BI-phosphohydrolase and . Negative reactions were observed for valine arylamidase, trypsin, α-chrymotrypsin, α-galactosidase, α-glucosidase, β-glucosidase, N-acetyl-β-glucosaminidase, α-mannosidase and α-fucosidase. Strain FF5 is susceptible to ceftriaxone, imipenem, gentamicin and ciprofloxacin but resistant to penicillin, amoxicillin, ticarcillin, amoxicillin-clavulanic acid, trimethoprim-sulfamethoxazole, colistin and vancomycin. Thus, the phenotypic characteristics of this strain support the claim that it belongs to Pantoea septica .

Matrix-assisted laser-desorption/ionization time-of-flight mass spectrometry protein analysis was performed using a Microflex spectrometer (Bruker Daltonics, Leipzig, Germany), as previously reported [9]. The scores previously established by Bruker Daltonics, used to validate or invalidate identification compared to the instrument database, were applied. Briefly, a score ≥ 2 for a species with a validly published name provided allows the identification at the species level; a score ≥ 1.7 and < 2 allows the identification at the genus level; and a score < 1.7 does not allow any identification. Twelve distinct deposits of strain FF5 were made from 12 isolated colonies. Each smear was overlaid with 2 μL of matrix solution (saturated solution of alpha-cyano-4-hydroxycinnamic acid) and dried for 5 min, as previously reported [9, 10]. The spectra from the 12 different colonies were imported into the MALDI BioTyper software (version 2.0, Bruker) and analyzed by standard pattern matching (with default parameter settings) against the spectra of 6252 bacterial spectra. Spectra were compared with the Bruker database that contained spectra from the ten validly named Pantoea species. The spectra obtained were similar to those of P. septica . A score of 2.3 was obtained for strain FF5 supporting the identification of P. septica . Its reference mass spectrum was added to our database (Fig. 2).

Fig. 2
figure 2

Reference mass spectrum from Pantoea septica strain FF5. Spectra from 12 individual colonies were analyzed and a reference spectrum was generated

Genome sequencing information

Genome project history

Pantoea septica strain FF5 was selected for sequencing because no genome of P. septica has previously been described. Besides, this strain is part of a study aiming to characterize the skin flora of healthy Senegalese people. It is the 17th genome of Pantoea species to be sequenced and the first genome within P. septica . The GenBank accession number is CCAQ000000000 and it consists of 4 scaffolds and 37 contigs. Table 2 shows the project information and its association with MIGS version 2.0 compliance [11]. Associated MIGS records are detailed in Additional file 1: Table S1.

Table 2 Project information

Growth conditions and genomic DNA preparation

Pantoea septica strain FF5 (= CSUR P3024 = DSM 27843) was grown aerobically on 5 % sheep blood-enriched Columbia agar (bioMérieux) at 37 °C. Bacteria grown on four Petri dishes were resuspended in 5 × 100 μL of TE buffer; 150 μL of this suspension was diluted in 350 μL 10X TE buffer, 25 μL proteinase K and 50 μL sodium dodecyl sulfate for lysis treatment. This preparation was incubated overnight at 56 °C. DNA was purified using 3 successive phenol-chloroform extractions and ethanol precipitation at −20 °C of at least two hours each. Following centrifugation, the DNA was suspended in 65 μL EB buffer. Genomic DNA concentration was measured at 46.06 ng/μL using the Qubit assay with the high-sensitivity kit (Life technologies, Carlsbad, CA, USA).

Genome sequencing and assembly

The genomic DNA of Pantoea septica was sequenced using MiSeq Technology (Illumina Inc, San Diego, CA, USA) with the 2 applications: paired-end and mate-pair. The paired-end and mate-pair strategies were barcoded in order to be mixed respectively with 10 other genomic projects prepared with the Nextera XT DNA sample prep kit (Illumina) and 11 other projects with the Nextera Mate-Pair sample prep kit (Illumina).

Genomic DNA was diluted to 1 ng/μL to prepare the paired-end library. The “tagmentation” step fragmented and tagged the DNA with an optimal size distribution of 2.25 kb. Limited cycle PCR amplification (12 cycles) completed the tag adapters and introduced dual-index barcodes. After purification on AMPure XP beads (Beckman Coulter Inc, Fullerton, CA, USA), the libraries were normalized on specific beads according to the Nextera XT protocol (Illumina). Normalized libraries were pooled into a single library for sequencing on the MiSeq. The pooled single-strand library was loaded onto the reagent cartridge, then onto the instrument along with the flow cell. Automated cluster generation and paired-end sequencing with dual index reads were performed in single 39-h run in 2x250-bp. Total information of 5.91 GB was obtained from a 654 K/mm2 cluster density with a cluster passing quality control filters of 93.7 % (12,204,000 clusters). Within this run, the index representation for P. septica was determined to be 2.25 %. So P. septica has 257,400 reads filtered according to the read qualities.

The mate pair library was prepared with 1 μg of genomic DNA using the Nextera mate-pair Illumina guide. The genomic DNA sample was simultaneously fragmented and tagged with a mate-pair junction adapter. The fragmentation profile was validated on an Agilent 2100 BioAnalyzer (Agilent Technologies Inc, Santa Clara, CA, USA) with a DNA 7500 labchip. The DNA fragments ranged in size from 1.5 kb up to 14 kb with an optimal size of 9 kb. No size selection was performed and 600 ng of tagmented fragments were circularized. The circularized DNA was mechanically sheared into small fragments on the Covaris device S2 in microtubes (Covaris, Woburn, MA, USA). The library profile was visualized on a High-Sensitivity Bioanalyzer LabChip (Agilent Technologies Inc, Santa Clara, CA, USA). The libraries were normalized at 2 nM and pooled. After a denaturation step and dilution to 10 pM, the pool of libraries was loaded onto the reagent cartridge, then onto the instrument along with the flow cell. Automated cluster generation and sequencing were performed in a single 39-h run in a 2x250-bp.

An overall quantity of 3.2 GB was obtained from a 690 K/mm2 cluster density with a cluster passing quality control filters of 95.4 % (13,264,000 clusters). The index representation for P. septica was determined to be 7.26 % within this run. P. septica has a total of 918,753 reads filtered according to the read qualities.

Genome annotation

Open Reading Frames prediction was performed using Prodigal [12] with default parameters. We removed the predicted ORFs if they spanned a sequencing gap region. Functional assessment of protein sequences was performed by comparing them with sequences in the GenBank [13] and Clusters of Orthologous Groups (COG) databases using BLASTP. tRNAs, rRNAs, signal peptides and transmembrane helices were identified using tRNAscan-SE 1.21 [14], RNAmmer [15], SignalP [16] and TMHMM [17] respectively. Artemis [18] was used for data management whereas DNA Plotter [19] was used for visualization of genomic features. In-house perl and bash scripts were used to automate these routine tasks. ORFans were sequences with no homology in a given database i.e. in a non-redundant (nr) or identified if their BLASTP E-value was lower than 1e-03 for alignment lengths greater than 80 amino acids. If alignment lengths were smaller than 80 amino acids, we used an E-value of 1e-05. PHAST was used to identify, annotate and graphically display prophage sequences within bacterial genomes or plasmids [20].

To estimate the nucleotide sequence similarity at the genome level between P. septica and another 7 members of the genus of Pantoea and 4 members of the genus Enterobacter , we determined the AGIOS parameter as follows: orthologous proteins were detected using the Proteinortho software (with the parameters following: E-value 1e-5, 30 % identity, 50 % coverage and algebraic connectivity of 50 %) [21] and genomes compared two by two. After fetching the corresponding nucleotide sequences of orthologous proteins for each pair of genomes, we determined the mean percentage of nucleotide sequence identity using the Needleman-Wunsch global alignment algorithm. The script created to calculate AGIOS values was named MAGi (Marseille Average genomic identity) and is written in perl and bioperl modules. GGDC analysis was also performed using the GGDC web server as previously reported [22].

Genome properties

The genome of P. septica strain FF5 is 4,548,444 bp long (1 chromosome, no plasmid) with a 59.1 % G + C content (Fig. 3). Of the 4193 predicted genes, 4125 were protein-coding genes and 68 were RNAs. A total of 3040 genes (72.50 %) were assigned a putative function. A total of 522 genes were annotated as hypothetical proteins. The properties and statistics of the genome are presented in Table 3. The distribution of genes into COG functional categories is presented in Table 4. A total of 214 were identified as ORFans (5.18 %).

Fig. 3
figure 3

Graphical circular map of the chromosome of P. septica strain FF5. From the outside in the two outer circles show open reading frames oriented in the forward (colored by COG categories) and reverse (colored by COG categories) directions, respectively. The third circle marks the rRNA gene operon (red) and tRNA genes (green). The fourth circle shows the G + C% content plot. The innermost circle shows GC skew, with purple and olive indicating negative and positive values, respectively

Table 3 Nucleotide content and gene count levels of the genome
Table 4 Number of genes associated with general COG functional categories

Insights from genome sequence

Here, we compared 11 genome sequences including Pantoea ananatis strain LMG 20103, P. vagans strain C9-1, P. ananatis strain LMG 5342, P. ananatis strain AJ13355, P. ananatis strain PA13, P. agglomerans strain 299R, P. stewartii subsp. stewartii strain DC283, Enterobacter cloacae subsp. dissolvens strain SDM, E. aerogenes strain EA1509E, E. asburiae strain LF7a and E. cloacae strain EcWSU1 (Table 5).

Table 5 Comparison of Pantoea septica strain FF5 with other genomes of several Pantoea species and some Enterobacter species

Table 5 shows a comparison of genome size, G + C content, coding-density and number of proteins for these genomes.

The G + C content (59.1 %) of P. septica strain FF5 differed by more than 1 % from all other compared species within the genus Pantoea [ P. vagans strain C9-1 (55.55), P. ananatis strains LMG 5342, AJ13355 and PA13 (53.45, 53.76, and 53.66, respectively), P. agglomerans strain 299R (54.3), P. stewartii subsp. stewartii strain DC283 (53.8)].

According to the previous demonstration that the G + C content deviation is at most 1 % within species, these values confirm the classification of strain FF5 in a distinct species [23].

Orthologous gene comparison of P. septica strain FF5 with other closely related species are summarized in Table 6. Intraspecies values ranged from 99.06 to 99.33 % for P. ananatis (Table 7). Interspecies AGIOS values ranged from 77.46 to 84.94 % within the Pantoea genus, and from 71.27 to 72.57 % between Pantoea and Enterobacter species (Table 7). When compared to other species, P. septica exhibited AGIOS values ranging from 77.7 to 80.5 with Pantoea species and from 72.38 to 73.26 with Enterobacter species (Table 7).

Table 6 Orthologous gene comparison of Pantoea septica strain FF5 with other closely related species
Table 7 dDDH values (upper right) and AGIOS values (lower left) obtained by comparison of all studied genomes

Conclusions

We describe the genome of Pantoea septica strain FF5. This is the first reported genome of P. septica . We also report phenotypic and phylogenetic characteristics of strain FF5. P. septica strain FF5 was isolated from the skin flora of a 35-year-old healthy Senegalese woman. The P. septica strain FF5 genome sequences are deposited in GenBank under accession number CCAQ000000000.