Background

Salmonella enterica, the causative agent of salmonellosis, is a human and animal pathogen that causes substantial economic losses and major public health concerns worldwide [1, 2]. Salmonellosis is estimated to be responsible for 93.8 million global human cases annually, among which 80.3 million cases are estimated to be foodborne [3]. In the United States alone, salmonellosis contributes to approximately 1.2 million human illnesses [4] and medical costs associated with salmonellosis total $3.7 billion every year [5]. The emergence and global spread of antimicrobial-resistant (AMR) S. enterica have further raised the public concern, as AMR S. enterica compromise the ability to treat infections in humans and animals [6, 7]. In addition, previous studies have suggested that AMR strains of S. enterica may be more virulent than susceptible ones [8].

S. enterica contains > 2500 recognized serotypes, which display a broad range of epidemiological and ecological characteristics. Host-adapted serotypes typically induce systemic disease in a limited number of host species, while non-host-adapted serotypes usually cause self-limiting gastroenteritis and less commonly systemic disease in a wide range of hosts [9]. Previous studies [10,11,12] have provided initial evidence that S. enterica serotype differences in host ranges and virulence characteristics are associated with genomic characteristics (e.g., genetic diversity, gene presence and absence patterns). These genomic characteristics are consequences of a variety of evolutionary and population genetics processes, such as gene acquisition and deletion, positive selection, homologous recombination and changes in population size. In particular, acquisition of non-homologous novel genes (e.g., pathogenicity islands, antibiotic resistance genes) by plasmid- or phage-mediated horizontal gene transfer, has been demonstrated to play a critical role in the evolution of Salmonella [8, 13]. Previous studies have also indicated that the level of gene degradation and gene deletion loosely correlates with the degree of host specificity displayed by particular Salmonella serotypes [14]. Loss of key metabolic functions has specifically been observed in many host-restricted serotypes, such as S. Typhi and S. Paratyphi A (human), and S. Gallinarum (fowl) [14,15,16]. Besides gene acquisition and deletion, positive selection and homologous recombination have been shown to play important roles in the serotype divergence and adaptive evolution in Salmonella as well [15, 17, 18]. For example, a total of 41 Salmonella genes reported by [15] showed evidence for positive selection, including genes likely contributing to virulence. In addition, S. Typhi and S. Paratyphi appear to have experienced a burst of recombination involving a quarter of their genes during the course of their adaptation to a highly virulent and human-specific lifestyle [19]. Changes in effective population size (Ne) can also have important impacts on emergence and diversification of bacterial lineages [20]. For example, multiple independent population expansions have been reported to lead to radial clusters of haplotypes among S. Typhi in Asia and Africa [21].

While a number of studies have explored the evolution of Salmonella, many of these studies have focused on a few specific serotypes (often S. Typhi) [16, 21], and few studies [6, 7, 16, 22] have specifically used genomics approaches to explore the evolutionary history and population genetic structure of AMR and multidrug-resistant (MDR) Salmonella, especially the potential influence of processes such as positive selection on the evolution of its antibiotic resistance. A few specific Salmonella serotypes are of a particular concern with regard to emergence and spread of MDR strains, including, but not limited to, serotypes Typhimurium, Newport, and Dublin, which represent the three serotypes this study focused on. As a pathogen infecting a wide range of hosts, S. Typhimurium is one of the leading causes of gastroenteritis in humans, and it is able to produce systemic infection in calves, sheep, goats, and pigs [9, 23, 24]. MDR S. Typhimurium phage type DT 104 represents one of the first MDR Salmonella clonal groups that was considered as a particular public health concern. This clonal group is characterized by a typical pattern of penta-resistance to ampicillin, chloramphenicol, streptomycin, sulfonamide, and tetracycline, and appears to have emerged in the United Kingdom as early as 1980 with global dispersal since the 1990s [25, 26]. In addition, another MDR Typhimurium clonal group mainly represented by phage type DT 193 has been described [27]. Most DT 193 isolates show the same pattern of penta-resistance as DT 104 except for kanamycin resistance in DT 193 (instead of chloramphenicol resistance in DT 104) [27]. S. Newport is a non-host-adapted pathogen that has been linked to many human gastroenteritis cases in both the United States and Europe in the past decades [18, 28]. S. Newport has been shown to be polyphyletic and S. Newport populations have been reported to have a geographic structure [2]. In a previous study, S. Newport strains isolated from Europe were more likely to belong to Lineage I, while strains from North America were more likely to belong to Lineages II and III [18]. A widespread emergence of Newport-MDRAmpC strains has been documented in the United States, which largely contributed to a 5-fold increase in the prevalence of Salmonella resistant to expanded-spectrum cephalosporins between 1998 and 2001 [29]. As a host-adapted serotype, S. Dublin is highly adapted to cattle and rarely infects humans [14]. In Japan, the prevalence of S. Dublin has drastically increased after the acquisition of a resistance (R)-plasmid in the early 1980s; this plasmid confers resistance to multiple antibiotics, including ampicillin, kanamycin, and nalidixic acid. [30]. In the United States, the incidence rate for human infection with S. Dublin increased more than that for infection with other serotypes, and a higher proportion of isolates were resistant to > 7 classes of antibiotics during 2005–2013 than during 1996–2004 [31]. Consistent with the host specificity of S. Dublin, previous studies suggested that at least some antimicrobial resistance traits in S. Dublin, such as resistance to nalidixic acid, were acquired within the bovine reservoirs [30, 32]. While our study reported here focused on characterization of AMR S. Typhimurium, S. Newport, and S. Dublin strains, emergence of AMR and MDR has also been described in other serotypes including Heidelberg and Paratyphi B [33].

In order to better understand the serotype-specific microevolution patterns of AMR S. enterica, we employed comparative genomic, evolutionary and phylogenetic approaches to further analyze previously reported [7] genome sequences for 90 AMR S. Dublin, S. Newport, and S. Typhimurium isolates from dairy cattle and humans from Washington state and New York state. The specific aims of this study were i) to characterize the evolutionary history of AMR S. Dublin, S. Newport, and S. Typhimurium; ii) to assess the relationship between genome content, including AMR genes and putative prophage genes, and serotype; iii) to identify genomic characteristics associated with the narrower host range of S. Dublin compared to other two serotypes; iv) to explore the influence of population dynamics, homologous recombination, and positive selection on the evolution of AMR S. enterica.

Results

Pan and core genomes, and AMR and prophage gene diversity

While the 90 genomes analyzed here were previously reported and used to assess associations between isolate sources and selected phenotypic (e.g., AMR) and genetic characteristics (e.g., presence/absence of AMR genes and plasmid replicons) [7], these genomes had not been previously annotated. In order to facilitate in-depth evolutionary and population genetics analyses of these genomes, we initially annotated these 90 genomes. Annotation identified a total of 7077 orthologous genes across the three serotypes (Additional file 1: Table S1). The pan genome of AMR S. Typhimurium was larger (by > 1000 orthologous genes) than that of S. Dublin and S. Newport, and the core genome of AMR S. Typhimurium was smaller (by approximately 300 orthologous genes) than that of the other two serotypes (Additional file 1: Table S2). According to the accumulation curves of the pan and core genome (Fig. 1a and b), S. Typhimurium displayed steeper slopes with an increasing number of sampled genomes affecting the number of genes in the pan and core genomes more drastically in this serotype, as compared to S. Dublin and S. Newport. Non-metric multidimensional scaling clustered isolates by serotype and showed a looser clustering of S. Typhimurium isolates, indicating that the three serotypes have very different gene presence/absence patterns and that S. Typhimurium has a more diverse gene composition (Fig. 2). The genome size of AMR S. Dublin ranged from 4.92 to 5.02 Mb; that of S. Newport ranged from 4.84 to 5.03 Mb; and that of S. Typhimurium ranged from 4.87 to 5.25 Mb (Additional file 1: Table S3).

Fig. 1
figure 1

Accumulation curves of a pan genome and b core genome of AMR S. Dublin (in blue), S. Newport (in brown) and S. Typhimurium (in red). The vertical bars indicate the standard deviations based on 100 repetitions with randomization of the order of the genomes

Fig. 2
figure 2

Non-metric multidimensional scaling ordination of AMR S. Dublin (in red), S. Newport (in green) and S. Typhimurium (in blue) isolates based on gene presence/absence

Consistent with previous genome analysis [7], 41 different AMR genes were identified among the three serotypes (Additional file 1: Table S4). While genes encoding penicillin-binding proteins (PBPs) were included as AMR genes in [7], they were excluded here. PBP is the target of β-Lactams and although mutations in PBP genes have been shown to confer resistance to β-Lactams in S. enterica [34], the presence of PBP genes is not associated with resistance to β-Lactams. S. Typhimurium isolates exhibited a large range in the number of AMR genes with 3 to 22 annotated AMR genes in a given isolate (Additional file 2: Figure S1) and a higher overall diversity of AMR genes (Shannon-wiener index: 3.01) as compared to S. Dublin and S. Newport (Shannon-wiener index of 1.73 and 1.05, respectively). A total of 732 prophage protein groups (PPGs) were identified, including 44 PPGs present in all 90 genomes (Additional file 1: Table S5). AMR S. Typhimurium genomes showed a wider range in the number of annotated PPG found in a given isolate, as compared to S. Dublin and S. Newport (Additional file 2; Figure S2).

Population dynamics

In order to estimate population changes over time, four Markov chain Monte Carlo (MCMC) population models, built based on single nucleotide polymorphism (SNP) data, were compared. Core SNPs were identified within each of the three serotypes including 2725 in S. Typhimurium, 128 in S. Dublin and 253 in S. Newport (numbers differed by 2 to 10 core SNPs from those previously reported by [7] as a different SNP filtering tool was applied). Among the four models evaluated, the Bayesian skyline population model, which estimates a posterior distribution of the Ne through time under a specified nucleotide-substitution model, showed the best fit for the population scenarios of AMR S. Dublin, S. Newport and S. Typhimurium. Bayesian skyline plots, which show the changes in the Ne over time, suggest a recent Ne reduction for S. Dublin (Fig. 3a). The Ne of S. Newport remained relatively constant over time, until very recently when it showed a noticeable decrease (Fig. 3b). The Ne of ancestral S. Typhimurium exhibited some fluctuations, followed by gradual stabilization with a constant Ne (Fig. 3c). Overall, the current AMR S. Typhimurium Ne is estimated to be approximately 10- and 50-fold greater than that for S. Newport and S. Dublin, respectively (Fig. 3).

Fig. 3
figure 3

Bayesian skyline plot indicating changes in the effective population size (Ne) of a AMR S. Dublin, b S. Newport and c S. Typhimurium over time with a relaxed molecular clock. The shaded area represents the 95% confidence intervals

Serotype-associated accessory genes and GO terms, and pseudogene distribution

Among the 3440 accessory genes, 1725 were found to be serotype-associated (i.e., significantly over- or under- represented in a given serotype). A total of 445, 360, and 427 genes were significantly over-represented in AMR S. Dublin, S. Newport and S. Typhimurium isolates, including 126, 80, and 83 genes present in all AMR S. Dublin, S. Newport and S. Typhimurium, respectively (designated here as “specifically present” genes; Fig. 4a). A total of 266, 227 and 235 genes were significantly under-represented in AMR S. Dublin, S. Newport and S. Typhimurium, including 108, 49, and 34 genes absent in all AMR S. Dublin, S. Newport and S. Typhimurium, respectively (designated here as “specifically absent” genes; Fig. 4a). AMR S. Dublin had both the largest number of genes specifically present (including the gene encoding GntR family transcriptional regulator and multiple genes encoding type VI secretion proteins) as well as the largest number of genes specifically absent (including genes encoding some cytoplasmic proteins and transporters) (see Additional file 1: Table S6).

Fig. 4
figure 4

The over- and under- representation of a orthologous genes and b GO and EC terms in AMR S. Dublin, S. Newport, and S. Typhimurium. Orthologous genes and GO/EC terms over- / under- represented in one serotype are defined as the ones identified as significant in given comparisons to the other two serotypes (FDR < 0.05, odds ratio >6.71 or < 0.15, respectively, for over- / under- represented)

More specifically, distribution of AMR genes and putative prophage genes also differed among AMR S. Dublin, S. Newport and S. Typhimurium isolates. aadB and cmlA were significantly enriched in AMR S. Dublin, while blaCARB, sulI, aadA, tetRG, and tetG were significantly enriched in AMR S. Typhimurium (Table 1). blaTEM-1D was significantly under-represent in AMR S. Newport, while blaCMY, sulII, tetA, tetR, strA, and strB were significantly under-represented in S. Typhimurium (Table 1). Significant association of putative prophage genes (based on clustering of genes into PPGs) with serotypes (see Additional file 1: Table S9) seems to represent the presence of specific prophages in different serotypes. Specifically, AMR S. Dublin was dominated by PPG genes annotated as belonging to prophages Salmon_ST64B, Entero_Fels_2 and Entero_P22; AMR S. Newport was dominated by Gifsy_1, Salmon_SEN34 and Entero_PsP3, while AMR S. Typhimurium was dominated by Entero_ST104 and Salmon_ST64B (Additional file 1: Table S9).

Table 1 AMR genes significantly over- and under-represented in AMR S. Dublin, S. Newport, and S. Typhimurium

At the gene ontology level, 4229 gene ontology (GO) terms (1878, 2020, and 331 classified into the molecular function, biological process, and cellular component categories, respectively) and 945 enzyme commission (EC) numbers were assigned to genes present in the 90 genomes (listed in Additional file 1: Table S7). While AMR S. Newport had the lowest number of both under- and over-represented GO terms and EC numbers, S. Typhimurium had the highest number of over-represented GO and EC terms and S. Dublin had the highest number of under-represented GO and EC terms (Fig. 4b, Additional file 1: Table S8). More specifically, a total of 14, 7, and 22 GO terms or EC numbers were only found (“specifically present”) in AMR S. Dublin, S. Newport, and S. Typhimurium, respectively, while 12, 0, and 2 GO terms or EC numbers were absent (“specifically absent”) in S. Dublin, S. Newport, and S. Typhimurium, respectively. Specifically present GO terms of particular interest included Type I site-specific deoxyribonuclease activity (GO:0009035), protein N-linked glycosylation via asparagine (GO:0018279), and antimonite transport (GO:0015699) in S. Newport, and inositol catabolic process (GO:0019310), response to hydrostatic pressure (GO:0051599), sphingolipid metabolic process (GO:0006665), and regulation of autophagy (GO:0010506) in S. Typhimurium. Specifically absent GO terms of particular interest included arginine transport (GO:0043858, GO:1902023) and ribulokinase activity (GO:0008741, EC:2.7.1.16) in S. Dublin.

Interestingly, AMR S. Dublin not only showed the highest number of specifically absent genes and GO terms, but also showed a significantly higher number of pseudogenes (average of 191 pseudogenes) as compared to AMR S. Newport and S. Typhimurium genomes (averages of 114 and 123 pseudogenes, respectively) (Additional file 2: Figure S3).

Phylogeny and serotype-clustered core genes

Consistent with the previous study [7] on the 90 genomes analyzed here, a bootstrapped maximum likelihood (ML) tree, based on the 36,075 core SNPs identified across the 90 AMR Salmonella isolates, showed three well-supported major clades consistent with serotype assignment (Fig. 5a). A phylogeny constructed for the AMR S. Newport genomes included here, along with reference S. Newport genomes representing isolates of Lineage II (including sub-lineage A, B, C) and Lineage III [2], shows that all AMR S. Newport isolates included in this study belong to Lineage IIC (Additional file 2: Figure S5). Furthermore, comparison of the multilocus sequence typing (MLST) sequence types (STs) for the 37 AMR S. Typhimurium isolates studied here and DT 104 and DT 193 reference isolates showed that 30 of the 37 S. Typhimurium isolates have the same ST as the DT 104 reference isolates (ST-19), while 2 isolates have the same ST as the DT 193 reference isolate (ST-34) (Additional file 1: Table S10).

Fig. 5
figure 5

a Phylogenetic tree inferred by maximum likelihood method using the core genome SNPs of 90 S. enterica isolates. Tree is rooted by midpoint. Bootstrap values > 70% are presented on the tree. S. Dublin is indicated by green, S. Newport by blue, and S. Typhimurium by red. b Proportion of core genes with different phylogenetic clustering patterns. Genes showing a phylogenetic clustering by serotype on the tree are indicated in orange. Genes showing phylogenetic clustering of only one serotype (i.e., isolates of one serotype were grouping together while those of other two were not) are indicated in grey. Genes showing no particular clustering pattern are indicated in yellow. Conserved genes (i.e., all sequences were identical) are indicated in blue

Individual ML trees for 2381 of the 3637 core orthologous genes also showed a phylogenetic clustering by serotype (these genes were designated as “serotype-clustered core genes”), indicating that a large fraction of core genes (65%) are serotype-clustered (Fig. 5b). Another 908 core genes exhibited a tree topology where isolates of one serotype clustered together, while isolates of the other two serotypes did not cluster by serotype. A total of 179 core genes did not show any particular pattern of clustering by serotype and another 169 core genes had identical sequences among all 90 isolates. Among trees constructed for the 19 AMR genes that were found in at least three genomes, only the tree for the core gene aac (6′)-Iaa, which encodes an enzyme that acetylates aminoglycosides, displayed clustering by serotype, supported by high bootstrap values (Additional file 2: Figure S4).

Homologous recombination

The number of putative genome-wide recombination events detected by Gubbins differed among serotypes with 26, 1, and 0 intra-serotype recombination events identified in AMR S. Typhimurium, S. Newport, and S. Dublin, respectively (Additional file 1: Table S11). In addition to the genome-wide recombination analysis, homologous recombination was examined for each of the orthologous genes (i.e., both core and accessory orthologous genes) using the PHI test and the program RDP4. A total of 42 genes showed strong evidence for recombination, defined as detection of a recombination signal with PHI as well as at least 4 of the 7 methods included in RDP4 (Table 2). These 42 genes included two core genes, one encoding a major facilitator superfamily (MFS) transporter and one encoding an oxidoreductase FeS-binding subunit. The remaining 40 genes were accessory genes, including 35 serotype-associated genes (Table 2) with 17 of these 35 genes clearly annotated as encoding prophage-relevant proteins, such as phage tail proteins, capsid proteins, protein NinG, baseplate assembly protein, and recombinases. The 35 serotype-associated genes with evidence for homologous recombination also included genes annotated as having functions related to conjugal transfer (e.g., the conjugal transfer protein TrbC) and DNA methylation (e.g., methyltransferase, DNA methylase).

Table 2 Orthologous genes with evidence of homologous recombination detected by both PHI and RDP4

Positive selection

A total of 41 genes (11 core genes and 30 accessory genes) showed evidence for diversifying selection across all serotypes, as detected by the site model in PAML (FDR <  0.05) (Additional file 1: Table S12); none of these 41 genes showed evidence for recombination. Eight of the 11 core genes with evidence for positive selection were serotype-clustered (defined here as showing clustering by serotype in the gene tree topology), and 20 of the 30 accessory genes with evidence for positive selection were serotype-associated (defined here as being significantly over- or under- represented in one serotype). The 41 genes with evidence for diversifying selection across all serotypes included 13 genes encoding cell surface proteins (Additional file 1: Table S12). These proteins included the cell envelope integrity protein TolA, the outer membrane protein assembly factor BamA, the conjugal transfer protein TraH, the type-IV secretion system protein TraC, different permeases and transporters, and fimbriae, pili, flagella-related proteins as well as porins and the phosphoporin PhoE. Additional genes that showed evidence for diversifying selection across all serotypes included three genes encoding proteins secreted by type VI secretion system as well as the genes encoding the virulence factor YopJ, a colicin-like toxin, and a chloramphenicol efflux MFS transporter (Additional file 1: Table S12).

Tests for diversifying selection within a given serotype identified 18 genes with evidence for positive selection in AMR S. Typhimurium (FDR <  0.05). Genes encoding a phage tail protein, an anion permease, and a flagellin showed evidence for strong positive selection, indicated by an overall dN/dS (ω) value > 500 (Additional file 1: Table S12). In addition, about half of the sites in pilJ, which encodes a type IV pilus biogenesis protein, showed significant evidence for diversifying selection (frequency p = 47.14%, ω = 22.43) (Additional file 1: Table S12). Diversifying selection within AMR S. Newport was observed in 8 genes (FDR < 0.05); the strongest selection pressure was found in the core gene encoding a DNA-binding transcriptional regulator, with ω ~∞, p = 4.00% (Additional file 1: Table S12). No genes showed evidence for diversifying selection within AMR S. Dublin.

To identify whether directional selection has played a role during the divergence of the three serotypes, 2381 serotype-clustered core genes were tested using the branch-site model in PAML. These analyses identified 5, 8 and 7 serotype-associated core genes with evidence for directional selection in AMR S. Dublin, S. Newport and S. Typhimurium ancestral branches, respectively (Table 3). The ω values for genes undergoing directional selection were generally high (Table 3). The 5 genes with evidence for directional selection on the AMR S. Dublin branch (Table 3), included the genes encoding the flagellar M-ring protein FliF, and the fimbrial protein StiA. The 8 genes with evidence for directional selection on the AMR S. Newport branch (Table 3) included invA (encoding the invasion protein facilitating bacterial host cell invasion) and the gene encoding the outer membrane protein assembly factor BamA (Table 3). Directional selection on the AMR S. Typhimurium branch was detected in 7 genes, including genes encoding a porin, a E3 ubiquitin--protein ligase, and a D-serine/D-alanine/glycine transporter (Table 3). The gene encoding an autotransporter outer membrane beta-barrel domain-containing protein showed evidence for directional selection in all three serotype ancestral branches (Table 3) and also showed evidence for diversifying selection across all serotypes (Additional file 1: Table S12).

Table 3 Serotype-associated core genes undergoing positive selection specifically on AMR S. Dublin, S. Newport or S. Typhimurium ancestral branches

Discussion

While a number of S. enterica serotypes include AMR strains and lineages, AMR strains representing serotypes Dublin, Newport, and Typhimurium are of particular public health importance and represent a range of host specificity categories, providing a model for developing a better understanding of the evolution and population genetics of AMR S. enterica. Through in-depth analysis of 90 isolates representing AMR S. Dublin, S. Newport and S. Typhimurium from humans and bovines from Washington state and New York state, we demonstrated that (i) AMR S. Dublin, S. Newport and S. Typhimurium exhibited distinct genomic characteristics and evolutionary patterns; (ii) genome content of AMR S. enterica, including AMR genes, was strongly associated with serotype; (iii) positive selection mainly targeted genes encoding cell surface proteins and genes likely to function in virulence and antimicrobial resistance, while homologous recombination mainly acted on prophage-associated genes. Overall, our data suggest that evolution of AMR characteristics in S. enterica shows serotype-specific patterns and may involve positive selection in antimicrobial resistance-related genes, in addition to acquisition of AMR genes through horizontal gene transfer.

AMR S. Dublin, S. Newport and S. typhimurium display distinct genomic characteristics and evolutionary patterns

AMR S. Dublin isolates included in this study displayed a relatively large core genome and small pan genome, showed low diversity of AMR genes (i.e., presence of a few distinct AMR genes), and encoded a few prophages including Salmon_ST64B, Entero_Fels_2 and Entero_P22, which appeared to be associated with AMR S. Dublin isolates. These genomic characteristics might be explained by the relatively small Ne, a recent population bottleneck, clonal population structure (i.e., infrequent intra-serotype homologous recombination), and limited diversifying selection observed here in AMR S. Dublin. This observation is consistent with the evolution being mainly driven by genetic drift, coupled with negative selection in bacteria with small Ne [12, 20]. Compared to AMR S. Newport and S. Typhimurium, S. Dublin genomes contained significantly more pseudogenes, and more orthologous genes and gene ontologies specifically absent from this serotype, suggesting genome decay, which are documented characteristics in host-adapted pathogens [12, 14, 35]. The functions of genes and gene ontologies specifically absent in AMR S. Dublin (e.g., arginine:ornithine antiporter activity, L-arginine transport, UDP-galactopyranose mutase activity, teichoic acid transport activity, ribulokinase activity) may not be required for growth and survival in the host niches typically occupied by S. Dublin. For example, a previous study reported that the arginine-ornithine antiporter is crucial for supplying external arginine as substrate to the Arginine Deiminase System (ADS), which contributes to acid resistance through production of ammonia. Consequently, for many pathogens the ADS system has been linked to virulence and fitness in the host [36]. Specifically, ADS-mediated production of ammonia has been shown to significantly prolong the intracellular survival of different bacteria, including Staphylococcus epidermidis and Streptococcus suis [36, 37]. As S. Dublin seems to be adapted to cattle, which have a rumen pH (ranging between 5.7 and 7.3) [38] much higher than that of human stomach lumen (1.5 to 3.5) [39], acid tolerance may be less important in S. Dublin, resulting in the loss of genes participating in the ADS-dependent arginine catabolism. In addition, L-ribulokinase participates in one of the two L-arabinose catabolism pathways to generate L-ribulose 5-phosphate [40]. As L-arabinose has been shown to regulate virulence gene expression in S. enterica [41], loss of one pathway of L-arabinose may affect virulence gene expression in S. Dublin.

Similar to AMR S. Dublin, AMR S. Newport genomes displayed a relatively low level of diversity, which might be a result of the likely-emergence of antimicrobial resistance in one single lineage, Lineage IIC, with all isolates tested here classified into this lineage. Also, the low genomic diversity may be due to a recent population size reduction, a relatively clonal population structure, and the low frequency of positive selection detected in AMR S. Newport in this study. In addition, geographic barriers might contribute to the low level of diversity observed in our data set; this hypothesis is supported by the association between geographic location and different Newport lineages observed in previous MLST and whole genome sequencing (WGS) studies [2, 18, 42, 43]. The GO terms “Type I site-specific deoxyribonuclease complex and activity”, “glucosyltransferase activity”, “protein N-linked glycosylation via asparagine”, and “antimonite transport” were specifically present in AMR S. Newport. These GO terms may be particularly important for the growth and survival of S. Newport in diverse hosts and environments. For example, restriction enzymes are the primary bacterial defense against lytic phages. Therefore, the presence of type I site-specific deoxyribonuclease complex and activity specifically in AMR S. Newport may represent a specific defense system against lytic phages in this serotype. In addition, glycans resulting from asparagine-linked glycosylation play a crucial role in various biological processes, such as protein folding, cellular targeting and motility, and immune response, in all three domains of life [44]. Hence, having the N-linked glycosylation via asparagine function may increase the fitness of S. Newport in host niches. Another interesting GO term specifically detected in AMR S. Newport is the biological process of antimonite transport. Antimonite is the salt of antimony (Sb(III)) and is toxic to cells [45]. Existence of antimonite transport in AMR S. Newport suggest that AMR S. Newport may have the ability to pump antimonite out of a cell, consequently conferring antimonite resistance. Interestingly, it has been suggested that microorganisms with multiple heavy metal resistances (including antimonite resistance) may also show resistance to some antibiotics due to co-location of heavy metal and antimicrobial resistance genes on the same mobile elements [46,47,48].

AMR S. Typhimurium exhibited a high level of genetic diversity, indicated by a relative open pan genome, a small core genome, a high diversity of AMR genes, and large variation in the number of AMR genes and prophage genes per genome. The high genetic diversity of AMR S. Typhimurium might result from its large Ne, relatively stable Ne over time, panmictic population structure (i.e., frequent homologous recombination), and relatively frequent positive selection observed in this study. These observations are consistent with the hypothesis that in large populations, selection overpowers genetic drift [20]. In addition, since we specifically selected AMR isolates for this study, the large pan genome size and diverse genome content of AMR S. Typhimurium may be associated with the emergence of antimicrobial resistance in multiple S. Typhimurium lineages, unlike the other broad-host serotype S. Newport, where antimicrobial resistance seems to have emerged in a single lineage (IIC). Specifically, at least two MDR S. Typhimurium groups have emerged in the past decades, DT 104 and DT 193 [27]. Based on the comparison with the MLST sequence type for DT 104 and DT 193 reference isolates, our dataset appears to include isolates belonging to at least two MDR Typhimurium lineages. A total of 22 GO and EC terms (e.g., inositol catabolic process and sphingolipid metabolic processes, which were shown to contribute to pathogenesis in mammalian hosts [49, 50]), were found to be specifically present in AMR S. Typhimurium. The functions associated with these gene ontologies might help S. Typhimurium to successfully compete in different niches and hosts. Notably, the GO term “regulation of autophagy” was also found to be specifically present in AMR S. Typhimurium. Autophagy is induced in the host to combat infection with various pathogenic bacteria, in which a double-membrane structure – autophagosome – engulfs invading pathogens and brings them to the lysosome for degradation [51, 52]. However, S. Typhimurium has been shown to exploit eukaryotic autophagy machinery during its intracellular life style in the host by inducing autophagic response to enter the host cell and suppressing autophagy within 3 h of infection [53]; our data suggest that Salmonella serotypes may differ in their ability to suppress autophagy as genes involved in regulation of autophagy function were only found in S. Typhimurium.

Genome content of AMR S. enterica, including some AMR genes, is strongly associated with serotype

Even though serotypes within a bacterial species are defined based on cell surface antigens, genome content is typically correlated to serotype [10, 11]. Our data showed that about half of the accessory genes were “serotype-associated” (i.e., significantly over- or under-represented in a given serotype) and the majority of core genes were “serotype-clustered” (i.e., phylogenetic clustering by serotype). The strong association between AMR S. enterica genome content and serotype could be explained by the periods of elevated serotype diversification (e.g., due to host immune response) followed by long time of relatively lower but constant diversification observed in Salmonella [54].

Consistent with a previous analysis of the genomes studied here [7], different AMR genes were found to be associated with Salmonella serotypes. The association between AMR genes and AMR S. Typhimurium was especially strong, indicated by 5 AMR genes (blaCARB, sulI, aadA, tetRG, and tetG) significantly enriched and 6 AMR genes (blaCMY, sulII, tetA, tetR, strA, and strB) significantly under-represented in this serotype. blaCARB, sulI, aadA, tetRG, and tetG have been previously reported to be significantly associated with plasmid replicon IncFII(S) [7]; IncFII plasmids carrying AMR genes have previously been reported for some S. Typhimurium strains [55]. blaCMY, sulII, tetA, tetR, strA, and strB were previously reported to be significantly associated with the plasmid replicon IncA/C2 [7]. These resistance genes had previously been detected on a IncA/C2 plasmid in S. Newport [56] and were found in all S. Newport isolates analyzed here. Interestingly, the only AMR gene present in all 90 genomes, aac (6′)-Iaa, displayed a robust clustering by serotypes in the gene tree. AAC (6′)-Iaa is a chromosomal-encoded aminoglycoside acetyltransferase, which effectively acetylates tobramycin, kanamycin, and amikacin [57]. This result indicates that aac (6′)-Iaa might have been introduced into an ancestral strain of S. Dublin, S. Newport and S. Typhimurium before serotype diversification and subsequently transmitted vertically, through chromosome replication.

While homologous recombination mainly acts on prophage-associated genes, positive selection mainly targets genes encoding cell surface proteins and genes likely to function in virulence and antimicrobial resistance in AMR S. enterica

Homologous recombination and positive selection have been shown to play critical roles in the evolution of bacteria including Salmonella [15, 58, 59]. In this study of AMR Salmonella, a number of genes showed evidence for homologous recombination or positive selection. Most of these genes were serotype-associated, suggesting the contribution of homologous recombination and positive selection in serotype diversification.

Most of the serotype-associated accessory genes showing evidence for homologous recombination encode prophage-associated proteins, such as phage tail proteins, capsid proteins, the protein NinG, a baseplate assembly protein, and recombinases. This result supports phage-mediated homologous recombination as one important mechanism in the remodeling of the bacterial genome [60]. Moreover, this result suggests that detectable homologous recombination is limited outside of prophage sequences. This could be due to Salmonella’s limited ability to acquire foreign DNA through transformation [61] or the low chromosomal diversity within serotype, which renders few nucleotide polymorphisms within recombinant fragments [60].

Congruent with previous observations that cell surface proteins are main targets of positive selection in both eukaryotes and prokaryotes [15, 62,63,64,65,66], most genes showing evidence for positive selection in this study encode cell surface proteins, such as outer membrane proteins, transporters, permeases, porins, cell surface appendages. Notably, a number of the proteins found here to show evidence for positive selection have reported or plausible functions in virulence, including genes encoding fimbriae, pili, flagella-related proteins [67, 68]. In addition, an autotransporter outer membrane beta-barrel domain-containing protein and the outer membrane protein assembly factor BamA, which both participate in the beta-barrel assembly, showed evidence for positive selection and have been proposed to play direct and indirect roles in virulence in Gram-negative bacteria [69]. Genes where positive selection may contribute to antimicrobial resistance include those encoding TraC and TraH, a chloramphenicol efflux MFS transporter, and a number of porins, including PhoE. Briefly, the VirB5-like TraC protein and VirB8-like TraH protein, which are two type-IV secretion system proteins, are virulence factors that may be targeted by antibiotics [70,71,72]. The chloramphenicol efflux MFS transporter functions as an export pump for the antimicrobial chloramphenicol [73]. Porins such as OmpF, OmpC, and PhoE, play an important role in allowing influx of β-lactam antibiotics in Gram-negative bacteria, which is important to allow these and potentially other antibiotics to access their targets [74]. As previous studies have indicated that specific mutations in porins may contribute to increased resistance to β-lactam antibiotics [15], our findings suggest that positive selection in porins could contribute to AMR phenotypes and adaptation of Salmonella to antibiotics. Porins also may play a role in virulence and have been reported to inhibit leukocyte phagocytosis by activating the adenylate cyclase system in S. Typhimurium [75].

Genes that were found to have evidence for positive selection, but do not encode cell surface proteins, include genes encoding proteins that appear to contribute to bacterial virulence (e.g., E3 ubiquitin-protein ligase, effector protein YopJ, invasion, D-alanine transporter) or bacteria-bacteria interactions (e.g., type IV secretion protein Rhs, and colicin) [76,77,78,79]. Interestingly, two of these proteins (E3 ubiquitin-protein ligase and D-alanine transporter) were also identified as being gradually lost in S. Cerro, which appears to show reduced human virulence [80]. In addition, some prophage-related genes, such as phage tail gene and baseplate gene, showed evidence for positive selection. Most interestingly, diversifying selection across all three serotype was detected in a gene encoding a 23S rRNA (guanine (745)-N (1))-methyltransferase. This methyltransferase methylates the 23S rRNA of Gram-negative bacteria at nucleotide G745, which is located at the peptide exit channel of the ribosome. This same site has been shown to be the binding site of macrolide, lincosamide, and streptogramin B antibiotics [81]. While it is not clear whether the methylation of G745 confers resistance to these antibiotics, mutation in this methyltransferase gene have been shown to increase resistance of Escherichia coli to the ribosome binding antibiotic viomycin [82]. Hence, positive selection detected in the gene encoding a 23S rRNA (guanine (745)-N (1))-methyltransferase might contribute to an increased resistance of S. enterica against some ribosome-targeted antibiotics. Overall, these findings suggest that positive selection may not only contribute to adaptation to stresses encountered in hosts and environmental niches, but also may play a role in the evolution of antimicrobial resistance.

Conclusions

By performing a comprehensive genome-wide analysis, we show that antimicrobial-resistant S. Dublin, S. Newport and S. Typhimurium exhibited distinct evolutionary patterns and genomic characteristics. AMR S. Typhimurium showed a large and diverse genome and frequent positive selection and homologous recombination, consistent with multiple emergence events of AMR strains and/or effective dispersal and ecological success of this serotype in different niches. AMR S. Dublin showed evidence for experiencing a recent population bottleneck (similar to Yersinia pestis [83] and Mycobacterium tuberculosis [84]) and genome decay, which are consistent with its narrow host range. While our data suggest that homologous recombination and positive selection drive the evolution in some serotype-associated genes, suggesting a critical role of these evolutionary mechanisms in serotype diversification of S. enterica, this hypothesis warrants further investigation since only AMR isolates were included in this study. Importantly, a focus on analysis of AMR Salmonella isolates supports the importance of a range of mechanisms in the evolution of antimicrobial resistance and the emergence of AMR lineages, in addition to horizontal transfer of well characterized antimicrobial resistance genes (e.g., genes encoding beta lactamases). For example, while positive selection in target proteins of antibiotics (e.g., gyrase) is well documented [85], our study suggests that positive selection in a wider range of genes, including genes encoding target modification enzymes (e.g., 23S methyltranferases) and genes encoding proteins that facilitate target access (e.g., porins), may contribute to the evolution of antimicrobial resistance. Future efforts on understanding the evolution, spread, and maintenance of antimicrobial resistance in additional Salmonella serotypes and other human and zoonotic pathogens with different host ranges and geographical locations are thus needed to characterize a variety of evolutionary mechanisms (e.g., horizontal gene transfer, mutations, positive selection), including their relative roles and dependencies. For example, some horizontally transferred plasmids may lead to intermediate- or low-level resistance that is not clinically relevant but enhances the risk of subsequent emergence of fully resistant subtypes via selection of mutations that lead to high-level resistance, as recently proposed for quinolone resistance [86]. In addition, positive selection in some genes may represent compensatory mutations that facilitate the fixation of AMR genes in populations [87]. While increasing availability of whole genome sequence data will facilitate more in-depth evolutionary studies of complex mechanisms underpinning the emergence and spread of antimicrobial resistance and AMR lineages, these studies will need to be supplemented by functional studies that define the phenotypic impact of gene acquisitions and mutations hypothesized to be linked to antimicrobial resistance. Future large WGS data sets as well as the data set analyzed here also provide a future opportunity to explore associations between geographical locations and host species and evolutionary patterns of interest, including AMR acquisition.

Methods

Isolation selection

A previously reported set of genome sequences for 90 S. enterica isolates [7] collected from the stool samples of human patients presenting clinical signs of salmonellosis (n = 49) and the fecal samples of bovine (n = 41) between 2008 and 2012 was used for this study, 45 of which were from New York state and 45 were from Washington state (Additional file 1: Table S3). Those isolates represent one of the three serotypes of interest (Dublin (n = 21), Newport (n = 32) and Typhimurium (n = 37)), as determined using the White-Kauffmann-Le Minor scheme [88] and had previously been tested to be resistant to at least one antimicrobial drug [7].

Whole genome sequencing and genome assembly

Methods of whole genome sequencing and genome assembly were detailed in [7]. The sequence data has been deposited in the National Center for Biotechnology Information’s (NCBI) Sequence Read Archive (SRA) under accession number SRP068320. Assembled genomes have been deposited at NCBI DDBJ/ENA/GenBank under the accession numbers listed in Additional file 1: Table S3.

Reference-free and reference-based variant calling, phylogenetic tree construction and recent population dynamics inference

The reference-free variant calling among all 90 genomes and phylogenetic tree construction followed the methods provided by [7]. To confirm the lineages of S. Newport isolates, genome assemblies of S. Newport str. CVM 19443 (GenBank Accession AHUB00000000), S. Newport str. CVM 19567 (AHTR00000000), S. Newport str. SL254 (CP001113), S. Newport str. SL317 (ABEW00000000) were downloaded from GenBank as reference genomes of S. Newport Lineage IIA, IIB, IIC, and III, respectively. An ML tree of 32 S. Newport genomes and 4 reference genomes was constructed following the same approach as for the ML tree of 90 genomes.

To maximize resolution, variant calling was performed within each serotype using the Cortex variant caller (cortex_var) [89] following the steps provided in [80]. For S. Typhimurium isolates, S. Typhimurium str. LT2 (RefSeq NC_003197.1) was used as a reference genome; for S. Newport isolates, S. Newport str. SL254 (GenBank Accession CP001113) was used as a reference; for S. Dublin isolates, the contigs of isolate BOV_DUBN_WA_10_R9_3233 in this study were used as a reference as detailed in [7]. A cleaning threshold inferred from the coverage distribution, a quality score threshold of 15, a cutting homopolymer threshold of 15, and a population filter to remove false calls were applied in running cortex_var. SNPs were filtered from other variants such as indels using vcftools version 0.1.5 [90], and SNPs in the recombination regions were filtered out using Gubbins version 1.4.2 [91].

The resulting high-quality SNPs were used to infer population dynamics for each serotype using BEAUti version 1.8.4 and BEAST version 1.8.4 [92]. The best nucleotide substitution models for SNPs within each serotype determined by MEGA7 [93] were specified in BEAUti version 1.8.4. The best model identified for S. Typhimurium was the general time reversible (GTR) model [94], while that for S. Newport and S. Dublin is the Kimura 2-parameter model [95]. An ascertainment bias correction was applied as SNPs were used [7] and base frequencies were estimated. The prior substitution rate for S. Typhimurium was set to 1 × 10− 6 /site/year [23]. S. Newport and S. Dublin were set to the average rate of bacteria (2.1 × 10− 7 /site/year) since the literature lacks estimations of substitution rates for either serotype. An MCMC algorithm was run for 100 million generations, with sampling every 10,000 generations, to estimates the posterior probability distributions of the genealogical and demographic parameters of a sample. Marginal likelihood estimations were computed by path sampling [96, 97] and stepping stone sampling using 100 steps with a chain length of 1 million generations, sampling every 1000 generations. For each serotype, combinations of either a (i) strict or (ii) lognormal relaxed molecular clock, and either a (i) coalescent constant size, (ii) exponent growth population size, (iii) Bayesian skygrid population or (iv) Bayesian skyline population were tested, and for each combination, three 100 million MCMC replicate runs were performed. Bayes factors, representing the ratio of the marginal likelihood of model, were compared among combination models. According to the Bayes factors associated with each molecular clock-population model combination, the best model for all three serotypes was the model of a relaxed molecular clock with a Bayesian skyline growth population. The log and trees files of converging individual replicate runs were combined in LogCombiner version 1.8.4 with a burn-in of 10,000,000 sampling every 100,000 states. The combined trees file was edited in FigTree version 1.4.2 with posterior probabilities being placed on the nodes. Effective sample sizes of run statistics examined by TRACER version 1.5 indicated convergence and adequate mixing of the Markov chains. Using TRACER version 1.5, the first 10% of each chain were discarded as burn-in and the Bayesian skyline plot indicating the population dynamics was reconstructed for each serotype.

In silico AMR gene and prophage detection, and multi-locus sequence typing of S. Typhimurium

AMR genes in all 90 assembled genome sequences were determined by nucleotide BLAST version 2.4.0 [98] and the formatted ARG-ANNOT database included with SRST2 [99]. The results of AMR genes were previously reported by [7].

The putative prophage-associated proteins were detected based on the sequence similarity of the protein-coding gene predicted by Prodigal version 2.60 [100] to a curated prophage protein dataset. The dataset, containing 265,986 prophage protein sequences, was downloaded in January 2016 from the PHAST server [101]. These prophage proteins were clustered into 167,195 groups by using CD-HIT version 4.6.5 [102] with a sequence similarity threshold of 90% identity. The 536 groups related to integrases or transposases were removed and the remaining 166,659 groups served as the PPG dataset used in this study. Protein BLAST was performed using proteins predicted by Prodigal version 2.60 against the PPG dataset. Query proteins with significant similarities (identities > 80% and query coverage > 80%) were classified as putative prophage proteins and subsequently clustered into the most similar PPG, resulting in 732 PPGs across the 90 genomes.

To determine the sequence type of AMR S. Typhimurium isolates analyzed in this study, two reference strains of phage type DT 104 (GenBank Accession NC_022569, GCA_002034985), one reference strain of phage type DT 193 – str. SO4698–09 (NZ_LN999997), and all the 37 assembled Typhimurium genomes from our dataset were analyzed using the MLST online service provided by Center for Genomic Epidemiology (https://cge.cbs.dtu.dk/services/MLST/).

Genome annotation, identification of orthologous genes, gene tree construction, and gene ontology and enzyme commission

Assembled genomes were annotated by the NCBI Prokaryotic Genome Annotation Pipeline [103]. OrthoMCL [104] was used to identify orthologous genes in the 90 annotated genomes. Amino acid sequences and nucleotide sequences of each ortholog were extracted from GeneBank using Genbank/EMBL to FASTA Conversion Tool (https://rocaplab.ocean.washington.edu/tools/genbank_to_fasta/). Amino acid sequences of each orthologous group were aligned using MUSCLE version 3.7 [105]. Amino acid sequence alignments were converted into nucleotide sequence alignments using PAL2NAL version 14 [106]. Gene trees of core orthologous genes and AMR genes were constructed with RAxML version 8.2.4 [107] under the general time-reversible model. All gene trees of orthologous genes were midpoint rooted and their branch lengths were removed in RStudio version 1.0.136 [108]. Customized Python scripts were used to detect clustering by serotypes in each gene tree. Core genes for which tree topology showed a clustering by serotype were defined as serotype-associated core genes. GO and EC terms were assigned to all 90 annotated genomes using Blast2GO v1.2.1 [109].

Homologous recombination detection

The Recombination Detection Program (RDP) and Pairwise Homoplasy Index (PHI) test were used to detect homologous recombination in individual orthologous genes. Cleaned-up alignments in which identical sequences were removed were used. The PHI test was performed in SplitsTree4 version 4.14.2 [110]. Seven homologous recombination detection methods (RDP, BOOTSCAN, GENECONV, MAXCHI, CHIMAERA, SISCAN, and 3Seq) incorporated in RDP4 [111] were further employed using the default settings. To reduce false positives, only recombination events detected by at least four of the methods used in RDP4 [112, 113] and also by PHI test in orthologous genes were considered to be significant events. Genome-wide recombination within each serotype were inferred by Gubbins version 1.4.2 [91] on concatenated sequence alignment using reference-based assembled genomes by cortex_var.

Positive selection detection

Nucleotide sequence alignments of protein coding genes containing isolates from all three serotypes and only isolates from each serotype were cleaned up by removing identical sequences. Alignments that had more than three non-identical sequences and contained at least one non-synonymous mutation were used for positive selection analyses using the PAML program version 4.8 [114]. Unrooted trees constructed by RAxML version 8.2.4 for each gene were used. Diversifying selection across codon sites among three serotypes and within each serotype was assessed using a likelihood ratio test (LRT) by comparing M1a model (nearly neutral) against M2a model [positive selection in a fraction of the sites (ω > 1)] (site model). Directional selection on specific ancestral branches to each of the three serotypes were assessed using LRT by comparing MA1 (positive selection model) against MA (neutral evolution model) (branch-site model) on core genes which showed clustering by serotype on the gene tree. Details of parameters used in each model were described in [115, 116]. The LRT statistic was calculated by 2 × ln (L0/L1), where L0 is the likelihood estimate for null model (neutral model) and L1 is the likelihood estimate for alternative model (positive selection model) which has more free parameters. The degree of freedom (df), which is calculated as the difference in the number of free parameters between the null and alternative models, was set to 2 for site model and 1 for site-branch model. Statistical significance was determined by approximating the test statistic to a χ2 distribution. A false discovery rate (FDR) correction was applied and genes with FDR < 0.05 were considered significant.

Statistics

The presence/absence of orthologous gene content table was used to perform the rarefaction curve estimation of core genome size and accumulation curve of pan genome size (i.e., the sum of core genes and accessory genes) estimation for each serotype using an R software script provided by [117]. The procedure was repeated 100 times, randomizing the order of the genomes every time to obtain the mean core and pan genome size estimates and standard errors. Non-metric multidimensional scaling was employed to compare the dissimilarity of gene presence/absence pattern among serotypes based on Bray-Curtis distance using RStudio version 1.0.136 [108]. Shannon-wiener index was calculated, using the abundance of AMR genes among isolates within each serotype to indicate the AMR gene diversity. Over-representation and under-representation of orthologous genes, GO terms, AMR genes and PPG was assessed using Fisher’s exact tests in RStudio version 1.0.136 [108] by comparing the presence/absence of features in one serotype against the other two. The FDR procedure [118] was applied to correct for multiple testing. Items significantly over- or under-represented in one serotype were defined as the ones identified as significant in given comparisons relative to the other two serotypes (FDR < 0.05, odds ratio >6.71 or  < 0.15, respectively, for over- / under- represented) [119]. The accessory genes that were significantly over- or under- represented in one serotype were defined as serotype-associated accessory genes. Pseudogenes were extracted from GenBank files of each isolate and Kruskal-Wallis test [120] was carried out to determine if the numbers of pseudogenes differed significantly among the three serotypes.