Introduction

Salmonella are Gram-negative facultative anaerobic bacteria of the family Enterobacteriaceae inhabiting the gastrointestinal tract of a wide variety of animals. There are currently over 2,600 serotypes (also called serovars) documented in the genus Salmonella . By chromosomal DNA hybridization experiments and MLEE, Salmonella currently are classified into two species,S. enterica and S. bongori (formerly subgroup V). The species S. enterica is further divided into six subspecies, including S. enterica subspecies enterica , S. enterica subspecies salamae , S. enterica subspecies arizonae , S. enterica subspecies diarizonae , S. enterica subspecies houtenae , and S. enterica subspecies indica , corresponding to the former subgroups I, II, IIIa, IIIb, IV and VI, respectively. Additionally, subgroup VII was described by Boyd et al. [1,2]. Salmonella taxonomy is a dynamic field of research and many issues remain unsolved, especially regarding species definition [3-5]. To avoid confusions, therefore, we use the traditional Salmonella classification system and the terms subgroup and serotype rather than subspecies or serovar (see more detailed explanation in [5]). Most of Salmonella infections in warm-blooded animals are caused by Salmonella subgroup I serotypes, and non-subgroup I serotypes are typically associated with cold-blooded vertebrates and rarely colonize the intestines of warm-blooded animals.

Salmonella evolved from a common ancestor with Escherichia coli about 120–150 million years ago [6,7]. During the evolutionary process, several key genomic events might have led bacteria to diverge, such as gene mutation and gene acquisition or loss [8]. Importantly, numerous lines of evidence have indicated that gene acquisition and loss are the major force driving the evolution of virulence in Salmonella [9]. In fact, it has been postulated that the evolution of Salmonella -specific virulence can be divided into three phases. The first phase is the split of Salmonella and E. coli by the Salmonella acquisition of Salmonella pathogenicity island 1, which is present in all lineages of Salmonella but absent from E. coli . SPI-1 encodes virulence factors that strengthen the infection of Salmonella serotypes by different mechanisms, including the invasiveness of the bacteria into intestinal epithelial cells [10], induction of neutrophil recruitment, and secretion of intestinal fluid [11-13]. The second phase is the divergence of Salmonella into S. bongori and S. enterica; this pathogenic lineage acquired SPI-2 [14-17], which contains genes encoding a type III secretion system that is required for survival in macrophages [18]. The third phase is the adaptation of Salmonella subgroup I to warm-blooded animals, but the key genomic events involved remain unknown.

Genome sequencing efforts in Salmonella have mostly focused on Salmonella subgroup I serotypes, largely due to their pathogenicity in humans. In this study, we sequenced the genome of a strain from Salmonella subgroup IIIa (also known as Salmonella arizonae ), which lies somewhere between Salmonella subgroups I and V in evolution. Based on the important evolutionary position of Salmonella subgroup IIIa, we anticipated that its genomic comparisons with other Salmonella subgroups, especially subgroups I and V, may provide novel insights into the evolutionary transition of Salmonella adaptation from cold- to warm-blooded hosts.

Organism information

Classification and features

S. arizonae is classified to Class Gammaproteobacteria , Order Enterobacteriales , Family Enterobacteriaceae and Genus Salmonella (Table 1). S. arizonae was first described in 1939 by the name Salmonella dar es salaam and was categorized as Salmonella subgroup IIIa, later named S. arizonae [19]. S. arizonae is a rare cause of human infection and is naturally found in reptiles.

Table 1 Classification and general features of S. arizonae RKS2983

We obtained RKS2983 from the Salmonella Genetic Stock Center (SGSC) as one of the strains in the set of Salmonella Reference Collection C strain (SARC6) [2]; it was initially isolated from a human of California in 1985. It is, like other Salmonella bacteria, Gram-negative with diameters around 0.7 to 1.5 μm and lengths of 2 to 5 μm, facultatively anaerobic, non-spore-forming, and predominantly motile with peritrichous flagella. The bacteria were grown at 37°C in Luria broth with pH of 7.2-7.6. Detailed information on the strain can be found at SGSC [20].

Genome sequencing information

Genome project history

This complete genome project was deposited in the Genomes On-Line Database (GOLD) and the complete genome sequence of strain RKS2983 was deposited at DDBJ/EMBL/GenBank under the accession CP006693.1. Table 2 presents the project information and its association with MIGS version 2.0 [21].

Table 2 Project information

Growth conditions and DNA isolation

S. arizonae RKS2983 was cultured to mid-logarithmic phase in 50 ml of Luria Broth on a gyratory shaker at 37°C. DNA was isolated from the cells using a CTAB bacterial genomic DNA isolation method [22].

Genome sequencing and assembly

The genome of S. arizonae RKS2983 was sequenced by use of two sequencing platforms, SOLiD 3.0 and Illumina HiSeq 2000. First, genomic DNA was sequenced with the Illumina sequencing platform by the paired-end strategy (2×100 bp) and the details of library construction and sequencing can be found at the Illumina web site [23]. The sequence data from Illumina HiSeq 2000 were assembled by SOAPdenovo v1.05 and the assembly contained 103 scaffolds with a genome size of 4.5 Mb. Then, the genomic DNA was sheared into 3 kb fragments by the Hydroshear instrument and was sequenced on a SOLiD sequencer by the mate-pair strategy (2 × 50 bp) according to the manual for the instrument (Applied Biosystems). The two sets of data from different methods were assembled by the velvet v1.2.09 software. The final assembly contained 20 scaffolds. Gaps between contigs were closed by PCR amplification using ABI3730 sequencer.

Genome annotation

Genes were predicted by Rapid Annotation using Subsystem Technology [24] with Glimmer 3 [25] followed by manual curation. The predicted coding sequences (CDSs) were translated and used to search the National Center for Biotechnology Information non-redundant database and Clusters of Orthologous Groups databases. These data sources were combined to assert a product description for each predicted protein. Then, we compared them with the annotated genes from four available Salmonella genomes, including S. typhi Ty2, S. typhimurium LT2 (AE006468) [26], S. arizonae RKS2980 (CP000880) [27] and S. bongori NCTC12419 (NC_015761) [17]. Non-coding genes and miscellaneous features were predicted using tRNAscanSE [28], RNAMMer [29], Rfam [30] and TMHMM [31].

Genome properties

The genome (Figure 1) consists of a chromosome of 4,574,836 bp (51.5% GC content) with 4,390 genes predicted, including 4,203 protein-coding genes, 22 rRNA genes, 82 tRNA genes and 98 pseudogenes. The properties and the statistics of the genome are summarized in Tables 3 and 4.

Figure 1
figure 1

Graphical circular map of the S. arizonae RKS2983 genome. From the outside to the center: genes on forward strand (color by COG categories), genes on reverse strand (color by COG categories), GC content, and GC skew. The map was generated with the CGviewer software.

Table 3 Nucleotide content and gene count levels of the genome
Table 4 Number of genes associated with the 25 general COG functional categories

Insights from the genome sequence

We first looked into the genetic relatedness of Salmonella and E. coli . For this, we concatenated the 945 genes common to the 25 sequenced strains analyzed in this study and conducted comparisons using BLAST with the parameters set at >70% DNA identity and >0.7 gene length ratio to categorize genes into common genes. The multiple sequence alignment program MAFFT program [32] was used to align the gene sequences of the Salmonella and E. coli strains. Phylogenetic trees were constructed with the aligned gene sequences using the Neighbor-Joining methods based on 1,000 randomly selected bootstrap replicates by MEGA 4.0 software [33]. The tree showed that S. bongori positioned between Salmonella subgroup I and E. coli , S. arizonae RKS2983 positioned between Salmonella subgroup I and S. bongori , and all Salmonella subgroup I strains were clustered together (Figure 2).

Figure 2
figure 2

Phylogenetic tree highlighting the position of S. arizonae RKS2983 (shown in bold) relative to strains of other Salmonella lineages. The corresponding GenBank accession numbers are displayed in parentheses. The tree was built based on the comparison of concatenated nucleotide sequences of 945 conserved genes in all strains. Individual orthologous sequences were aligned by the MAFFT program [32] and concatenated. The phylogenetic tree was constructed by using the MEGA 4.0 software [33] with Neighbor-Joining method. The bootstrap values are shown at branch points.

The core gene data of S. arizonae RKS2983, S. bongori NCTC 12419 and S. typhimurium LT2 (representing Salmonella subgroup I)were presented in Figure 3. There are 2823 genes common to all three genomes and 926 genes specific in RKS2983. SPI-2 is in the set of 516 genes common to RKS2983 and LT2 and is absent in S. bongori NCTC 12419. As many as 1017 genes are in LT2 but not in the other two genomes; we postulate that some of these genes may be associated with virulence to warm-blooded hosts.

Figure 3
figure 3

Venn diagram showing the core genes in S. arizonae RKS2983, S. bongori NCTC 12419 and S. typhimurium LT2. The core genes conducted using BLAST with the parameters set at “>70% DNA identity and >0.7 gene length ratio”.

We compared these genomes for presence or absence of Salmonella pathogenecity islands (SPIs) and found that S. arizonae RKS2983 shared some of the SPIs with S. bongori NCTC 12419 and others with S. typhimurium LT2 or S. typhi Ty2 (Table 5), providing opportunities of evolutionary studies about acquisition of SPIs during transition of Salmonella from cold- to warm-blooded animal pathogens.

Table 5 Distribution of known SPIs in four representation genomes of Salmonella genus

Conclusions

S. arizonae is phylogenetically positioed between S. bongori and Salmonella subgroup I and shares some pathogenicity-associated genes with S. bongori and some others with Salmonella subgroup I lineages. Therefore S. arizonae genome analyses may provide important clues to key genomic events that might have facilitated the evolution of warm-blooded animal pathogens from cold-blooded parasites.