Background

Staphylococcus xylosus are coagulase-negative, gram-positive cocci that are widespread in the environment and are commensal on the skin and mucosal surfaces of animals. Despite the fact that S. xylosus is considered a nonpathogenic bacterium, opportunistic infections in animals and humans have been linked to S. xylosus in several studies. S. xylosus is widespread and could be found in a variety of environments, including contaminated water, meat, fodder, and soil surfaces [4, 15, 19, 27, 29, 31, 34, 36]. It has been shown that coagulase-negative staphylococci (CoNS) can play an important role in bovine intramammary infections and also share mobile genetic elements that carry virulent factors such as antibiotic resistance markers with other family members, including S. aureus [21, 34, 62]. In addition to its clinical relevance, S. xylosus may contribute to the pathogenicity of other staphylococci through horizontal gene transfer of antibiotic resistance elements such as the SCCmec type 11 region and tetracycline resistance [33, 37]. Since S. xylosus is becoming increasingly infectious alongside other staphylococci, it is essential to investigate the genome of this ubiquitous commensal. S. xylosus has been sequenced at the genome level far less frequently than S. aureus in the public domain, and none has been performed in Iraq. Even though previous studies have done genome annotation and analysis of S. xylosus, a thorough exploration of the pathogenicity of this bacterium based on genomic information gained through next-generation sequencing (NGS) is still necessary, particularly for relating data from various geographic regions [29]. During an investigation of mastitis-causing agents in the governorate of Basrah, Iraq, we identified an antibiotic-resistant strain of S. xylosus (coagulase-negative staphylococci (CoNS) from a milk sample of a cow with chronic mastitis. Using disc diffusion method, we found that this isolate was resistant to methicillin, ampicillin, cefoxitin, oxacillin, and tetracycline but was sensitive to vancomycin. In addition, the strain showed notable biofilm-formation capacity. To further understand the genetic background for these phenotypes, Staphylococcus xylosus NM36 whole genome sequencing was undertaken to find variant information and to perform gene annotation on key genes relevant to antibiotics and biofilm formation.

Methods

S. xylosus NM36 was isolated from a clinical mastitis of a cow. Isolation and identification were conducted using standard microbiological procedures. Confirmation was achieved by sequencing the PCR product of the 16 s RNA using universal primers 27F and 1492R [22] and blast analysis using the NCBI database [53]. Genomic DNA was extracted using the QIAamp DNA Mini Kit, Qiagen USA, catalog number 51304, according to the manufacturer’s instructions. DNA samples were sent for whole genome sequencing using the Illumina platform sequencer (Macrogen, Korea). After conducting quality control (QC), samples for library construction were subjected to random DNA fragmentation, followed by 5′ and 3′ adapter ligation. Adapter-ligated fragments were amplified and purified by PCR and gel. The library was fed into a flow cell for cluster generation, where fragments were captured on a lawn of surface-bound oligos that were complementary to the library adapters. By means of bridge amplification, each fragment was amplified into separate clonal clusters. After the generation of clusters, the templates were ready for sequencing. Following sequencing, raw reads were analyzed for overall read quality, total bases, total reads, and GC (%), and basic statistics were calculated. In order to reduce biases in analysis, FastQC [3] and quality-filtering processes were performed. The quality of the produced data was determined by applying the phred quality score at each cycle Q20 (%) and Q30 (%) which helps measure the quality of the identification of the nucleobases generated by automated DNA sequencing [20]. The raw reads were de novo assembled into contigs using the SPAdes v.3.5 bioinformatics tool [6].

Genome analysis and comparison with other genomes

Staphylococcus xylosus NM36’s assembled genome was submitted to PATRIC’s comprehensive genome analysis service, which uses PATRIC’s curated collection of representative antimicrobial resistance (AMR) gene sequence variants [63]. In order to map reads obtained from sequencing, Staphylococcus xylosus was used as a reference genome. Filtered reads were mapped to the reference genome with BWA—Burrows-Wheeler Aligner [35]. After read mapping, Picard and SAMTools were used to remove duplicate reads and find variant information [47]. Broad Institute, [16]. To contrast the mapping results, the assembled genome was further annotated for functional genes in subsystem categories using the classic RAST and RASTtk server [5, 11] and the SEED tool [45]. In all annotation and comparison processes, the similarity threshold was at least 95% identity. The annotated features were further verified and illustrated using the PROKSEE server [25] which was used for identifying conserved and unique sequence features and to generate high-quality maps as previously described [56]. Using the SEED tool, the Staphylococcus xylosus NM36 genome was further compared to other closely related genomes (ANMR00000000.1, CP007208.1, CP008724.1, CP031275.1, CP066721.1).

Genome submissions to NCBI GenBank

The genome sequence of Staphylococcus xylosus NM36 has been deposited at GenBank—DDBJ/ENA/GenBank under the accession number GenBank JARUHN000000000.1. The annotation was added by the NCBI Prokaryotic Genome Annotation Pipeline (PGAP) [57].

SNP and INDEL discovery

The mapped data were examined for single-nucleotide polymorphism (SNPs) and insertion/deletion (INDEL) variations compared to the reference genome. In this analysis, the reference genome is based on RefSeq assembly accession: GCF_000709415.1 (CP008724.1). After removing duplication and finding variants’ information with SAMTools, each variant’s information was gathered and classified by chromosomes or scaffolds.

Phylogenetic analysis

The phylogenic analysis package at PATRIC [63] was used to categorize reference and representative genomes. PATRIC incorporated the reference and representative genomes into the phylogenetic analysis included in the report on Comprehensive Genome Analysis. In summary, Mash/MinHash identified the closest reference and representative genomes. From these genomes, PATRIC global protein families (PGFams) were selected to ascertain the phylogenetic placement of this genome. The nucleotides of these sequences were mapped, and multiple sequence comparison by log expectation (MUSCLE) was used to align the protein sequences of these families. The combined set of amino acid and nucleotide alignments was concatenated into a data matrix, and RaxML was used to analyze this matrix, with rapid bootstrapping used to generate the support values in the tree. In addition, a phylogenetic tree was built based on the 16 s RNA sequence relationship using the NCBI Tree Viewer (TV). For this comparison, we selected only the sequences whose genomes had been fully sequenced and deposited in the NCBI database.

Results

Genome annotation

Based on the annotation data and the contrast to other genomes in PATRIC within the same species, this genome is considered of good quality. This was confirmed by the phred quality score of bases over Q20 and Q30 which were 98.47% and 94.34%, respectively, after read filtering. The Comprehensive Genome Analysis showed that this assembled genome has 73 contigs, with a total length of 2,668,086 bp, 2454 coding proteins (Table 1, Additional file 1: Appendices A: Table S1). Furthermore, the average GC content is 32.8%. A schematic representation for GC content and GC skew analysis is shown in Fig. 1. A subsystem is a set of proteins that together implement a specific biological process or structural complex. The annotation process included an analysis of the subsystems unique to this genome, which revealed that there are 278 subsystems. An overview of the subsystems for this genome is provided in Fig. 2. Single-nucleotide polymorphism (SNP) analysis showed that there were 46,610 SNPs, 523 insertions, and 551 deletions compared to the reference genome (Additional file 1: Appendices B: Table S2).

Table 1 Summary for the genome analysis report of NCBI prokaryotic genome annotation pipeline (PGAP)
Fig. 1
figure 1

Circular genome representation of Staphylococcus xylosus NM36 (PROKSEE server). The inner most ring represents chromosome position, and the red-colored ring represents the genome backbone (in contigs). A GC content (black) and GC skew (green represents values greater than the genome average, whereas purple represents value less than the genome average. B The open-reading frames on the forward strand (outside of the backbone) and reverse strands (inside of the backbone)

Fig. 2
figure 2

Staphylococcus xylosus NM36 distribution statistics for subsystem categories. Using the Rapid Annotation System Technology (RAST) server, the genome was annotated. The pie chart displayed the number of each subsystem feature, and the SEED viewer displayed the subsystem coverage. The green bar of the subsystem coverage represents the proportion of proteins included in the subsystems, while the blue bar represents the proportion of proteins excluded from the subsystems

Phylogenetic analysis

PATRIC provided the reference and representative genomes, which were included in the phylogenetic analysis. The reference and representative genotypes closest to them were determined and illustrated in Fig. 3. It turned out that strains S. xylosus HKUOPL8 and NJ (ANMR00000000.1) have the highest similarity. In addition, a phylogenetic tree was built based on the 16 s RNA sequence relationship using NCBI Tree Viewer (TV) for the sequences for which their genome had been fully sequenced (Fig. 4). This allowed us to select five related genomes for sequence-based comparison by the RAST tool, which revealed that strain NM36 and the other closely related strains possess a high abundance of coding DNA sequences (CDS), mainly coding for carbohydrate and amino acid metabolism (Figs. 2 and 5). In all 5 genomes, there are 1188 genes with the same annotated functions with at least 95–100% identity.

Fig. 3
figure 3

Phylogenic relationship representation of S. xylosus NM36 based on the genome features (using the genome annotation service in PATRIC)

Fig. 4
figure 4

16S rRNA sequences distance tree depicting the relationship between S. xylosus NM36 (shown in yellow) and related S. xylosus strains in the NCBI database (strains with complete genome) using NCBI TV

Fig. 5
figure 5

Graphical genome comparison map of the NM36 strain (reference) with five closely related species using the Seed Viewer sequence-based comparison tool in the RAST server. From outside to inside rings: (1) strain HKUOPL8 (CP007208.1), (2) strain NJ (ANMR00000000.1), (3) strain SMQ-121 (CP008724.1), (4) strain 2 (CP031275.1), and (5) strain 2.1523 (CP066721.1). From purple (100%) to pale red (10%), the colors represent the similarity of amino acids to the reference genome. The NM36 reference strain’s genome is not depicted in the figure

Resistance to antibiotics and toxic compounds

A significant number of the genes annotated have homology to known virulence factors, transporters, drug targets, and antimicrobial resistance genes. To filter the results, we combined and filtered the results from the SEED, PATRIC, and PROKSEE servers into one list. Then we investigated AMR gene sequence variants and assigned to each AMR gene a functional annotation, a drug class, and the specific antibiotic it confers resistance to. A summary of the AMR genes annotated in this genome and the corresponding AMR mechanisms is provided in Table 2.

Table 2 S. xylosus NM36 genome analysis for antibiotics resistance. The annotation was based on the protein domain database [38,39,40,41]

Quorum sensing and biofilm formation

Since the biofilm is essential to staphylococcal biology, several regulatory systems that take into account the physiological state of the cell, environmental cues, and the dynamics within the staphylococcal community tightly regulate the formation and disassembly of biofilms. A list for the annotated genes that are involved in biofilm formation in S. xylosus NM36 is listed in Table 3.

Table 3 S. xylosus NM36 genome analysis for genes involved in regulation and cell signaling—quorum sensing and biofilm formation. The annotation was based on the protein domain database [38,39,40,41]

Discussion

Despite the fact that S. xylosus is considered a nonpathogenic bacterium, several studies have linked it to opportunistic infections in animals and humans. S. xylosus is widespread and can be found in numerous environments, such as contaminated water, animal feed, and soil surfaces [1, 2, 13, 14, 19, 27, 33, 36, 37, 62]. S. xylosus may contribute to the pathogenicity of other staphylococci via horizontal gene transfer of antibiotic resistance elements, such as the SCCmec type 11 region [37] or tetracycline gene transfer in Staphylococcus xylosus in situ during sausage fermentation, thereby exacerbating the risk of antibiotic resistance and posing a significant risk to public health [33]. In Iraq, there is limited information about Staphylococcus xylosus, which is reported occasionally during clinical investigations [2, 51] and similarly to other coagulase-negative staphylococci, and S. xylosus receives less interest compared to the more focus on S. aureus. The pathogenicity of staphylococci has been primarily linked to their capacity to resist antimicrobials and form biofilms. The initial attachment of bacteria to biotic and abiotic surfaces results in the accumulation of multilayered cell aggregates that constitute biofilm formation. This facilitates the internalization and survival of staphylococci within the host cells [54]. Therefore, strains that facilitate this trait are regarded as more virulent. S. xylosus NM36 possesses a number of virulence determinants that have been associated with the ability of staphylococci to adhere to biotic and abiotic surfaces, as well as the different phases of biofilm formation and antimicrobial resistance summarized in Tables 2 and 3. These results validate the initial phenotypes of multiresistance and biofilm formation observed during the initial isolation. Comparing the NM36 genome of S. xylosus to clinical reference strains revealed its arsenal of antibiotic resistance and virulence genes. In addition, S. xylosus NM36 contains 9 antibiotic resistance determinants responsible for resistance to 10 known antibiotics, including quinolone, methicillin, teicoplanin, bicyclomycin, chloramphenicol, fosfomycin, ampicillin, cefoxitin, oxacillin, and tetracycline. The NM36 genome harbors the ica operon and transcriptional regulator TcaR, both of which have been implicated in biofilm formation in staphylococci. It also contained the global regulators agr (accessory gene regulator), the main autolysin gene atl (autolysin), sarA (staphylococcal accessory regulator), and the two-component system arlRS and srrAB, which are involved in the regulation of adhesion and biofilm formation. Strain HKUOPL8 (CP007208.1) shares the maximum degree of protein similarity with NM36 (Figs. 3 and 5) [36]. According to the genome data on the NCBI website, strain HKUOPL8 (CP007208.1) was isolated from a clinical case (panda feces) [36], strain NJ (ANMR00000000.1) from a nasal swap (human), strain SMQ-121 (CP008724.1) from fermented sausage [31], strain 2 (CP031275.1) from a milker’s hand, and strain 2.1523 (CP066721.1) from fermented sausage. This variation in genome similarity may be attributable to lifestyle and isolation source differences, which may have affected the genetic composition of these isolates [48, 64]. INDELS are a significant source of genetic diversity that can significantly affect the properties or evolvability of a protein [52]. Single-nucleotide polymorphism (SNP) analysis showed that there were 46,610 SNPs, 523 insertions, and 551 deletions compared to the reference genome. Whether some of these mutations are advantageous, guiding the protein onwards towards a point of high fitness to current selective pressures, or not, will require additional research in the future. Similar to conventional molecular typing, it is probable that the genomes of isolates recovered from an outbreak or cluster of infections are closely related and may share pathogenic traits due to horizontal gene transfer [12]. In recent years, the widespread availability of whole genome sequencing (WGS) has made it possible to examine in greater detail patterns of spread, including the detection of previously undocumented transmission. Whole genome sequencing (WGS) can be used to investigate infectious disease epidemics and track the spread of infection, but unlike conventional molecular typing techniques such as spa typing, pulse-field gel electrophoresis (PFGE), and multilocus-sequence typing (MLST), WGS enables the comparison of entire genomes, thereby enhancing the resolution and accuracy of metabolic and subsystem maps [50]. However, the accumulation of genome sequences in the databases has been sporadic, with biased sampling of natural variation motivated primarily by medical and epidemiological priorities. For instance, sequencing epidemic lineages of methicillin-resistant Staphylococcus aureus (MRSA) is favored over sequencing sensitive isolates (methicillin-sensitive S. aureus: MSSA). As more diverse genomes are sequenced, a picture of a highly subdivided species with a limited number of relatively clonal groups (complexes) that dominate in specific geographic regions at any given time emerges, as reviewed by Planet et al. [48]. Our findings support this contention and advocate for whole-genome surveillance of other non-S. aureus populations in animals, which could lead to more accurate predictions of antibiotic resistance and the virulence of emergent clones. Ultimately, this can provide a better understanding of the enigmatic biological aspects that determine the recurrent strain dominance in endemic areas. In our investigation, we sequenced the genome of Staphylococcus xylosus, a coagulase-negative Staphylococcus that is often missed in conventional laboratory exams. Staphylococcus xylosus NM36’s unique virulence traits are a new variable in the complex epidemiology of mastitis in Basrah governorate.

Conclusion

This research represents the first investigation into the genomic characteristics of S. xylosus within the geographical context of Iraq. This observation further underscores the need of using whole genome sequencing and comparative genomics analysis in order to get deeper insights into the origins and testing methodologies of multidrug-resistant isolates. Furthermore, there is a need to reassess microbiological and therapeutic approaches in the management of coagulase-negative staphylococci, especially in the context related to animal illnesses and public health.