Complete genome sequence of methicillin-sensitive Staphylococcus aureus containing a heterogeneic staphylococcal cassette chromosome element

Staphylococcus aureus is a common human bacterium that sometimes becomes pathogenic, causing serious infections. A key feature of S. aureus is its ability to acquire resistance to antibiotics. The presence of the staphylococcal cassette chromosome (SCC) element in serotypes of S. aureus has been confirmed using multiplex PCR assays. The SCC element is the only vector known to carry the mecA gene, which encodes methicillin resistance in S. aureus infections. Here, we report the genome sequence of a novel methicillin-sensitive S. aureus (MSSA) strain: SCC-like MSSA463. This strain was originally erroneously serotyped as methicillin-resistant S. aureus in a clinical laboratory using multiplex PCR methods. We sequenced the genome of SCC-like MSSA463 using pyrosequencing techniques and compared it with known genome sequences of other S. aureus isolates. An open reading frame (CZ049; AB037671) was identified downstream of attL and attR inverted repeat sequences. Our results suggest that a lateral gene transfer occurred between S. aureus and other organisms, partially changing S. aureus infectivity. We propose that attL and attR inverted repeats in S. aureus serve as frequent insertion sites for exogenous genes.

Staphylococcus aureus is a common human pathogen that can cause serious infections such as necrotizing pneumonia and sepsis. With the emergence and spread of methicillin-resistant S. aureus (MRSA) and vancomycin-and linezolidresistant strains [1-3], a decreasing number of antibiotics is effictive against this pathogen. It is therefore imperative to investigate the molecular basis and mechanisms of virulence and drug resistance in S. aureus.
One distinct feature of S. aureus is its ability to acquire antibiotics resistance. Staphylococcal cassette chromosome (SCC) elements are currently the only vectors described for mecA, a gene that encodes methicillin resistance in Staphylococcus species. A lateral transfer event involving SCCmec was reportedly responsible for the acquired resistance [4]. The SCCmec element was originally thought to exist only as a complete sequence, but SCC elements have recently been detected in methicillin-sensitive S. aureus (MSSA) strains, suggesting loss of the portion of the SCC element that confers methicillin resistance [5]. High-throughput genome sequencing methods, especially those using nextgeneration platforms, afford the unprecedented ability to decipher sequence variations associated with the mechanism of resistance. In a previous study, the methicillin-susceptible strain MSSA476 was found to contain a novel SCCmec-like element (designated as SCC476). The SCCmec-like element is integrated into the same chromosome site as SCCmec elements in MRSA strains, but encodes a putative fusidic acid resistance protein [6]. These findings provide useful information regarding the molecular basis of SCCmec-like elements in MSSA.
In an earlier study [7], we discovered that the SCCmec III type-specific fragment open reading frame (ORF) CZ049 (AB037671; MRSA85/2082), previously considered unique and specific to SCCmec III in MRSA strains, is also present in the MSSA463 strain. This suggested that either fragment transfer or another structural change occurred during evolution of the strain [7]. In the study reported here, we used genomic techniques to determine key features of the SCC-like MSSA463 strain to gain new insights into virulence mechanisms of S. aureus.

Ethics statement
The institutional review boards and ethics committees at First Affiliated Hospital of Nanchang University and Peking University People's Hospital approved the study protocol. Written informed consent was obtained from the patient from whom the bacterial strain was acquired prior to the beginning of this study. The study was conducted in accordance with the Declaration of Helsinki.

Genome sequencing and assembly
We used a shotgun approach in conjunction with pyrosequencing [8] on a Roche/454 GS FLX platform (Roche Applied Science, Branford, US) to obtain 70-fold genomic coverage of MSSA463. The resulting sequence data was assembled using Newbler, an assembly program provided by the manufacturer. After refining the assembly to eliminate ambiguity, sequence gaps were closed by sequencing a series of PCR products. We used Glimmer 3.02 to predict open reading frames (ORFs) [9], which were further annotated based on the non-redundant (nr) database using the Basic Local Alignment Search Tool (BLAST) (http://blast. ncbi.nlm.nih.gov/Blast.cgi) with an E-value cut-off of 10 5 . Proteins without homologs in the nr database were termed hypothetical proteins; those with homologs of unknown function were designated as conserved hypothetical proteins. Further analyses of the genome sequence were carried out using other databases, including UniProtKB/Swiss-Prot (http://www.ebi.ac.uk/uniprot/, E-value cut-off of 10 10 ) [10], Kyoto Encyclopedia of Genes and Genomes (KEGG) (Release50; E-value cut-off of 10 10 ) [11], Clusters of Orthologous Groups (COG) (E-value cut-off of 10 10 ), Interpro (Interproscan 4.3, Release16.0) [12], and Gene Ontology (GO) [13] databases (Table S2).

Genome assembly and annotation
Using the GS de novo assembler Newbler [8], we performed a de novo assembly of the initial reads (average read length of 352 bp and 70-fold coverage; Table S3), which yielded 33 contigs, 22 of which were longer than 500 kb. The contigs covered 98% of the genome (Table S3).
After gap closure, the final MSSA463 genome sequence was 2771498 bp long and contained 2563 predicted ORFs, with a gene density of 0.92 per kilobase and a simple repeat representation of 0.93% over the entire length. The low G+C content observed 32.45% is typical for Firmicutes. The longest ORF in the genome was 28236 bp long; the average gene length was 903 bp. The genome contained 5 rRNA operons, 59 tRNA ORFs, and 29 small RNA coding sequences (Table S4, Figure 1).
We used the COG database for gene annotation. Each COG cluster includes proteins that are inferred to be orthologs, i.e., direct evolutionary counterparts ( Figure S1). There were 208 orthologs classified into the unknown functions category and 232 orthologs for which only general functions were predicted ( Figure S1). Gene Ontology categories for MSSA463 ORFs are shown in Figure S2.

Multiple alignments of S. aureus genomes
Using the Artemis Comparison Tool [14], we aligned the MSSA463 genome sequence to four other S. aureus genome assemblies: MSSA476, a community-acquired MSSA; USA300, a community-acquired MRSA; N315, a hospitalacquired MRSA; and Mu50, a vancomycin intermediate sensitive strain or VISA. All of these genomes were ~2.8 Mb in size and had a G+C content of 32%. Overall, 2235 genes (87.2%) in the MSSA463 genome were also present in all four of the other strains. Eighty-seven genes (3.4%) were found in only three of the four reference strains, an additional 40 genes (1.6%) were present in only two of the reference strains. Fifty-six genes (2.2%) were shared with only one reference genome. In total, we identified 145 genes that were unique to MSSA463 ( Figure S3).

Analysis of mobile elements
Using reference strains MSSA476 and MRSA252, we examined the MSSA463 genome for variations in genomic islands and mobile regions, the locations of genes involved in virulence or drug resistance [15]. Although we did not find the typical SCC structure in MSSA463 (Table 1), temperate bacteriophages were present; this was expected, as most known S. aureus strains harbor more than one bacteriophage. MSSA463 was discovered to possess the same Saα-νSaβ island including four staphylococcal exotoxin genes (spl, lukDE, hysA, and bsa) present in six other S. aureus strains (MSSA476, MRSA252, N315, Mu50, MW2, and NCTC8325). When we analyzed pathogenicity islands, we found the pathogenicity island SaPI6 in MSSA463 and MSSA476, but not in 8325, COL, USA300, or Mw2. Similarly, two pathogenicity islands, SaPI4 and SaPI1028, were only detected in MRSA252. Insertion sequences were also found in the S. aureus strains. For example, IS1272 and IS431 were present in MRSA252 and IS431 was present in MSSA476, whereas, no transposons were detected in

Pathogenicity-associated coding sequences (CDS) and cell wall/membrane-associated resistance genes
We used the KEGG database for further functional gene prediction in MSSA463, with a focus on disease-related CDS. Multiple genes associated with infectious disease were identified in the MSSA463 genome. These included fnbB and fnbA, which are involved in bacterial invasion of epithelial cells, K08303 (a putative protease), and ureA, ureB, and ureC, which are associated with Helicobacter pylori infections of epithelial cells. Other identified genes included luxS, associated with the Vibrio cholerae pathogenic cycle, and arginases such as rocF (K0416; E3.5.3.1) and arg, implicated in eukaryotic amebiasis (Table S5). Virulence and drug-resistance proteins are generally thought to be located on the cell wall or membrane [16]. Some exogenous pathogenic factors, such as adhesionrelated SdrD, beta-channel forming cytolysin, and clumping factor B, were located in ORFs of MSSA463. Although MSSA463 is sensitive to methicillin, we found some resistance protein-coding genes, including bicyclomycin resistance protein TcaB and chloramphenicol resistance protein. These results are consistent with the phenotypic characteristics of the strain.

Phylogenetic relationship of MSSA463 and its SCC-like element to other S. aureus strains
We used a web-based tool, Interactive Tree Of Life or iTOL, to generate and display phylogenetic trees [17] (Figure S3). In this tree, MSSA463 is phylogenetically closest to MRSA252 (gi 49240382; Figure 2).
Based on the generated phylogeny, we selected several strains, including MSSA463, MRSA85/2082, MSSA476, and MRSA252 [6] for a more detailed comparative analysis of the SCC-like element ( Figure S4). All SCCmec elements were typical, integrating at exactly the same site across all of strains. The inverted repeat sequences found in MRSA strains were also present in MSSA463; in contrast, SCCmec components (including the conserved site-specific recombination enzymes ccrA and ccrB) were completely missing from MSSA463. The specific ORF CZ049 sequence (for the basis of diagnostic detection using PCR assays) was present in MSSA463, but was located 1686 bp downstream of attL and attR inverted repeat sequences , rather than between them as in other strains.

Discussion
Following the sequencing of the first S. aureus N315 strain, 22 additional S. aureus genomes have been deposited in the NCBI genome database (http://www.ncbi.nlm. nih.gov/ genomes/lproks.cgi). Because the strains investigated in our study, corresponding to the 22nd genome sequence added to the database, is not a typical SCC S. aureus, it has been designated as SCC-like MSSA463.
MSSA463 is pathogenic. Many pathogenic bacteria can invade both phagocytic and non-phagocytic cells and colonize them intracellularly, with subsequent dissemination to other cell types [1821]. Introduction into non-phagocytic host cells, such as epithelial cells, occurs via two different mechanisms, i.e., the "zipper" and "trigger" models. Pathogenic bacteria that enter their hosts using the zipper model include Listeria, Streptococcus, and Yersinia. The presence of fnbB and fnbA in MSSA463 suggests that S. aureus uses these zipper model genes to initiates the infection cascade. The abundant presence of other pathogenic genes in MSSA463, including ureABC, luxS, rocF, and arg, is evidence for similar pathways shared in common with H. pylori, V. cholerae, and infectious amoeba [16,2224]. These shared pathway raise the possibility of frequent horizontal gene transfer between these pathogens as well as between prokaryotic and eukaryotic pathogens [25,26].
In addition to its pathogenicity, MSSA463 is drug-and antibiotic-resistant. With the development of genome sequencing technology and the consequent accumulation of genome-scale information, our understanding of drug resistance has significantly improved. Katayama et al. [27] were the first to demonstrate that the MRSA methicillin-resistance gene is carried by a novel genetic element, SCCmec, which is integrated into and excised from the S. aureus chromosome through the mediation of a unique set of recombinase genes: ccrA and ccrB. They were also the first to define the structure of SCCmec elements. SCCmec elements are currently classified into different groups based on the nature of mec and ccr gene complexes, and are further classified into different subtypes according to characteristics of their "junkyard" regions [28,29]. In MRSA strains, two SCCmec elements are located within the left and right boundaries of attL and attR inverted repeat sequences, respectively. It is typically believed that these two SCCmec elements co-exist, and that the lateral transfer of this element as a whole from other bacteria gave rise to methicillin resistance in S. aureus. The presence of SCC476 elements in MSSA476, however, suggests that other forms of the SCCmec element may also exist in MSSA strains. The existence of these elements may be a common phenomenon in S. aureus, or may simply be the result of the loss of other elements from MRSA strains. While this theory needs to be further explored, these findings increase our understanding of the molecular details of bacterial resistance [7].
Our discovery of an SCC-like sequence in MSSA476 has important implications for drug resistance diagnosis. Previous preliminary molecular epidemiology studies of S. aureus in lower respiratory tract infections [7] used multiplex PCR assay methods to characterize and subtype SCCmec elements. The presence of ORF CZ049 was considered to be specific to SCCmec III strains such as MRSA85/2082 [28]. In MRSA85/2082 strain, ORF CZ049 is located between attL and attR inverted repeat sequences, as defined by the SCCmec structure. Although this SCCmec III typespecific fragment is also present in other SCCmec type strains, including MRSA strain JCSC3624(WIS) [30] and MSSA strain MSSA463, considered to be a unique marker for SCCmec III in MRSA strains [28]. Interestingly, ORF CZ049 is also found in MSSA463.
One possible explanation for the unusual location of CZ049 in MSSA463 is that MSSA463 is derived from an MRSA strain in which ORF CZ049 and the SCC element were in close proximity in the chromosome. Alternatively, the SCCmec element may have been duplicated upstream of ORF CZ049, with subsequent loss of the SCCmec element at the original location. A third possibility is that the abnormal location of CZ049 arose as a consequence of chromosome recombination events [31,32]. Whether the downstream location of ORF CZ049 is the result of a short ele-ment insertion or is instead due to secondary changes associated with loss of other SCCmec components, remains to be determined.
In summary, we used genome sequencing and comparative analysis to acquire novel insights into S. aureus pathology and resistance mechanisms and the infection cascades involved in its pathogenicity. The role of inverted repeat sequences in antibiotic resistance requires further study. As more non-classical SCC-like genome sequences are completed, we expect that a more complete understanding of the mechanisms underlying generation and function of antibiotic resistance genes in S. aureus will be developed. Figure S1 Clusters of Orthologous Groups (COG) of protein categories of MSSA463. The numbers and the proportion of Staphylococcus aureus orthologs with various biological roles are shown. These genes are involved in cell cycle control; cell division; chromosome partitioning; cell motility; cell wall/membrane/envelope biogenesis; defense mechanisms; intracellular trafficking, secretion, and vesicular transport; post-translational modification; protein turnover; chaperones; signal transduction mechanisms; replication; recombination and repair; transcription; translation; ribosomal structure and biogenesis; amino acid transport and metabolism; carbohydrate transport and metabolism; coenzyme transport and metabolism; energy production and conversion; inorganic ion transport and metabolism; lipid transport and metabolism; nucleotide transport and metabolism; and secondary metabolite biosynthesis, transport, and catabolism. There are 208 orthologs annotated with unknown functions and 232 orthologs with only a predicted general function.    The supporting information is available online at life.scichina.com and www.springerlink.com. The supporting materials are published as submitted, without typesetting or editing. The responsibility for scientific accuracy and content remains entirely with the authors.