Introduction

Shigella is the second leading cause of diarrheal deaths globally, mainly among children less than 5 years. Shigella flexneri and Shigella sonnei are the leading cause of diarrhea in developing countries like India while other two serogroups are relatively uncommon [1]. Historically, S. sonnei is mainly seen in developed countries but its recent spread into developing countries over the last decades has raised major public health concerns [2]. Due to its low infectious dose, clinical severity, serotype specific immunity, emerging antimicrobial resistance and having humans as the only natural host, Shigella is categorized as a priority pathogen among enteric bacteria on Global Antimicrobial Resistance Surveillance System (GLASS) by World Health Organization (WHO) [3].

The key virulence factors that are involved in the pathogenesis of Shigella are located on both the plasmid and chromosome of the pathogen enabling it to survive intra-cellularly. Shigellosis is generally self-limiting but the use of antibiotics reduces the duration of symptoms and pathogen shedding which in turn reduces transmission. The increasing awareness of disease burden and emerging threats posed by drug resistant Shigella have resulted in an interest in the development of Shigella vaccines which are currently in the clinical trial stage [1].

There is an increasing interest in exploring the molecular epidemiology of genetically encoded virulence and resistance factors in Shigella as this provides information on the severity of infection, transmission and the pathogen response to antimicrobials. The virulence and resistance determinants are mainly located on mobile genetic elements (MGEs) such as plasmids, insertion sequences, integrons, pathogenicity islands and bacteriophages in Shigella spp. Horizontal gene transfer (HGT) of these elements acts as an important driver for bacterial evolution [4]. Through HGT, the pathogen enhance its ability to establish infection and to acquire resistance to outcompete other susceptible bacteria in the gut by transferring genes between the commensal and other pathogenic bacteria that are circulating locally [5, 6]. These MGEs can be predicted using whole genome sequencing (WGS) through bioinformatics analysis. Recently, the advancement of whole genome sequencing methodologies has a major impact on bacterial genoe wide studies and in the epidemiological analysis of bacterial pathogens.

In this study, we report the first complete genome of S. flexneri serotype 2a and S. sonnei strain using a hybrid assembly approach of both long-read MinION (Oxford Nanopore Technologies) and short-read Ion Torrent 400 bp sequencing platforms. The availability of the complete genome of Shigella clinical strains and subsequent genome analysis provides a better understanding into its genome characteristics including virulence, resistance and mobile genetic elements.

Materials and methods

Bacterial isolates

The two clinical Shigella strains, S. flexneri 2a (FC906) and S. sonnei (FC1653) sequenced were isolated from stool specimens at the Department of Clinical Microbiology, Christian Medical College, Vellore, India.

Genome sequencing

Genomic DNA was extracted using QIAamp DNA Mini Kit (QIAGEN, Hilden, Germany) according to the manufacturer’s instructions. DNA quality and quantity was assessed using Nanodrop spectrophotometry (Thermofisher, USA) and Qubit 3.0 (Thermofisher, USA) respectively. To get the closed genome, a hybrid approach using long read MinION and short read IonTorrent sequencing was performed as described previously [7]. Briefly, short read sequencing was performed with 400-bp read chemistry using an IonTorrent™ Personal Genome Machine™ (PGM) (Life Technologies, Carlsbad, CA) as per manufacturer’s instructions. Long read sequencing was performed using SQK-LSK108 Kit R9 version (Oxford Nanopore Technologies, Oxford, UK) using 1D sequencing method according to manufacturer’s protocol.

Assembly and annotation

The Fast5 files were generated from MinION sequencing and the reads were base called with Albacore 2.0.1 (https://nanoporetech.com/about-us/news/new-basecaller-now-performs-raw-basecalling-improved-sequencing-accuracy). Canu 1.7 was used for error correction of reads and assembly with genome size of 3.0 m as input [8]. The quality of the MinION reads was assessed using MinIONQC (https://github.com/roblanf/minion_qc). To increase the accuracy and completeness of genome, we performed hybrid assembly using both Ion torrent and MinION reads with Unicycler (v0.4.7) [9]. By default, unicycler utilizes SPAdes [10] to assemble the short reads with different k-mers and filter out the low depth regions along with error correction and quality checks. Subsequently, it trims and generates the short read assembly graph. In addition, it uses Miniasm [11] and Racon [12] to assemble the MinION long reads and further the reads were bridged to determine all the genome repeats and produces complete genome assembly. In addition, multiple rounds of short reads polishing was performed with Pilon [13] to reduce the base level errors in long read assembly.

After assembly, the genomes were annotated using the NCBI Prokaryotic Genome Annotation Pipeline (PGAP). Virulence and antimicrobial resistance genes (ARG) were detected in silico by VirulenceFinder ((https://cge.cbs.dtu.dk/services/VirulenceFinder/) [14] and ResFinder (https://cge.cbs.dtu.dk/services/ResFinder/) database respectively with the 90% threshold for identity and with 60% of minimum length coverage [15]. Sequence type of the isolates were analyzed using MLST 2.0 (Multi Locus Sequence Typing) tool (https://cge.cbs.dtu.dk//services/MLST/) [16]. Shigella PAIs was compared with the reference sequences through BLASTn and visualized using Easyfig [17]. The genomes were screened for prophages using PHAST tool [18]. ISsaga was used to predict the number of insertion sequences in the genome (https://www-issaga.biotoul.fr/issaga_index.php) [19].

Quality assurance

Species confirmation was performed by biochemical tests (motility, urea, citrate, indole, triple sugar iron) and species specific PCR was done [20, 21]. A pure isolated colony was used for genomic DNA extraction. The strain identification was confirmed through BLAST annotation using NCBI database and species was predicted using KmerFinder available at center for genomic epidemiology.

Results and discussion

Genome features

A hybrid assembly approach provided a complete single chromosome for S. flexneri (FC906) as well as chromosome and 3 plasmids with size of 8401 bp, 6015 bp and 2690 bp for S. sonnei (FC1653). On BLAST analysis, the plasmids showed 100%, 99.7% and 100% similarity against previously identified plasmids S. sonnei FDAARGOS_524 plasmid unnamed2, S. sonnei IDH01791 plasmid pSSE3 and S. sonnei CFSAN030807 plasmid pCFSAN030807_8 respectively. The comparison of genetic content of the plasmids against its respective reference plasmid are depicted in Fig. 1a–c. Utilization of this approach facilitates the complete genome analysis of clinical strains, especially in studying the structural arrangement of mobile genetic elements which plays a major role in AMR dissemination. The genome features of the sequenced isolates are given in the Table 1.

Fig. 1
figure 1

a Circular representation of unnamed plasmid 1, pSS1653 carrying AMR genes identified in S. sonnei (red color indicates AMR genes, green color indicates mobile elements, other CDS shown in blue color). The direction of arrows indicates the orientation of open reading frames (ORFs). b Circular representation of genetic arrangement of unnamed plasmid 2 identified in S. sonnei (blue and green color denotes CDS and reference sequence respectively). The direction of arrows indicates the orientation of open reading frames (ORFs). c Circular representation of genetic arrangement of unnamed plasmid 2 identified in S. sonnei (dark blue and light blue color denotes CDS and reference sequence respectively). The direction of arrows indicates the orientation of open reading frames (ORFs)

Table 1 Genomic features and Predicted insertion sequence elements of S. flexneri (FC906) and S. sonnei (FC1653) by hybrid assembly approach

The annotated chromosome of FC906 has been deposited in GenBank under accession number CP037996. For FC1653, the annotated chromosome and plasmids have been deposited under accession numbers CP037997 and CP037998, CP037999, CP038000, respectively.

Virulence and resistance determinants

The S. flexneri genome possesses virulence genes such as invasion plasmid antigen (ipaH), long polar fimbriae (lpfA), and serine protease autotransporter protein (pic and sigA) belongs to SPATEs family. Alike, S. sonnei genome carried invasion plasmid antigen (ipaH), long polar fimbriae (lpfA), enterotoxin ShET-2 (senB) and serine protease autotransporter protein (sigA). Generally the ipaH family genes are present in multiple copies on both the virulence plasmid and chromosome of the Shigella genomes [22]. However, the gene was identified in chromosome in the sequenced isolates.

Further, the toxin genes that belongs to SPATE family has been commonly categorized into 2 classes. The gene sigA belongs to class 1 and are toxic to epithelial cells, whereas pic gene is non-toxic and usually involved in colonization. These were first reported in S. flexneri serotype 2a which is in accordance with the present study [23]. In addition, the gene encoding Shigella enterotoxin 2 identified in S. sonnei, is reported to be involved in invasion process and play an important role in transport of electrolytes [24].

The genomes were also found to contain multiple resistance genes conferring resistant to streptomycin, beta-lactamase, tetracycline, trimethoprim/sulfamethoxazole, aminoglycosides and chloramphenicol. Resistance genes such as aadA1, blaOXA-1, tetB, dfrA1, and catA1 were identified in the S. flexneri chromosome. In S. sonnei, dfrA1 gene was identified in chromosome, the genes sulII, aph(6)-Id, aph(3’’)-Ib and tet(A) were identified in plasmid 1, herein named as pSS1653. These were the acquired resistance genes commonly reported among Shigella spp. On mutation analysis in quinolone resistance determining region (QRDR), S. flexneri had double mutations in gyrA (S83L and D87N) and single mutation in parC (S80I) genes. Similarly, S. sonnei had mutations S83L and D87G in gyrA and S80I in parC genes. No mutations were observed in gyrB gene. These mutations are commonly associated with fluroquinolone resistance in Shigella spp. as reported in previous studies [25,26,27].

Mobile genetic elements and pathogenicity island

Mobile elements such as bacteriophages, integrons, IS elements and PAIs are the major drivers of Shigella genome evolution and plasticity. They play a crucial role in pathogen virulence and in resistance spread. Analysis revealed the presence of class 1 integrons in S. flexneri and no integron in S. sonnei. In addition, the insertion sequences (IS) elements in Shigella are found to contribute to the antibiotic resistance and the evolution of the pathogen [28]. Shigella genomes naturally harbour hundreds of IS and inactivation of genes (formation of pseudogenes) have been caused by IS, either through IS mediated interruption or IS mediated genome rearrangement. This inactivation of genes hinders the ability of Shigella to cause disease in humans [28, 29]. In this study, 735 and 857 pseudogenes were identified in S. flexneri and S. sonnei respectively. Also a total of 391 and 535 IS elements were predicted to be present in S. flexneri and S. sonnei genomes. The most common family identified in both the genome was the IS1 family, accounting for approximately 29% and 32% of the IS elements, followed by IS3_ssgr_IS3 family in S. flexneri and S. sonnei. The predicted IS elements were given in Table 1.

In Shigella, the serotype conversion is generally mediated by bacteriophages [30]. The hybrid assembly analysis revealed, 15 phage regions (8 intact, 4 incomplete, 3 questionable) in S. flexneri. Similarly in S. sonnei, 15 phage regions with 5 intact, 6 incomplete and 4 questionable were identified. The phage regions covers approximately 10% and 7% of the entire chromosome of S. flexneri and S. sonnei respectively. On the third phage region of the S. flexneri chromosome, intact SfII bacteriophage was identified which is responsible for conferring the serotype 2a. The details of the identified prophages, length, position, number of CDS and GC content are provided in Tables 2 and 3.

Table 2 Prophage content of S. flexneri (FC906) analyzed using PHAST tool
Table 3 Prophage content of S. sonnei (FC1653) analyzed using PHAST tool

Pathogenicity islands are the clusters of mobile elements that encode various virulence factors [30]. PAI such as SHI-1 (also called she), SHI-2 and Shigella resistance locus (SRL) were identified in S. flexneri genome. SHI-1 contains virulence genes like pic and sigA. SHI-2 comprising of genes encoding a aerobactin operon, iron acquisition siderophore system, transposases and several hypothetical proteins that are associated with the increased virulence of the pathogen [30]. The resistance locus, SRL contains aadA1, blaOXA-1, cat and tet genes conferring resistance to streptomycin, beta-lactams, chloramphenicol and tetracyclines.

Whereas, SHI-1 was absent in S. sonnei, and possess only SHI-2 island. This could be due to the ability of the SHI-1 to undergo spontaneous and specific excision via site-specific recombination [31]. This shows that S. sonnei might have lost its SHI-1 region in the course of evolution process to add other important genes for their successful survival. These pathogenicity islands are reported to be associated with phage integrases, suggesting the role of phages in the evolution of Shigella [32]. The BLAST comparison of these islands with reference was shown in Figs. 2 and 3.

Fig. 2
figure 2

BLAST comparison of Shigella pathogenicity islands identified in S. flexneri (FC906) against reference sequence using Easyfig. a SHI-1 pathogenicity island, b SHI-2 pathogenicity island, c Shigella resistance locus (SRL) carrying antimicrobial resistance genes. Vertical blocks between the two sequences indicate the shared similarity regions shaded according to BLASTn (the pink shading indicate the matches in the same direction and red for inverted matches)

Fig. 3
figure 3

BLAST comparison of Shigella pathogenicity island (SHI-2) identified in S. sonnei against reference sequence using Easyfig. Vertical blocks between the two sequences indicate the shared similarity regions shaded according to BLASTn (the pink shading indicate the matches in the same direction and red for inverted matches)

The present study provided insights into the genetic content and complete structure of various mobile genetic elements that carries virulence and resistance determinants. Though, whole genome sequencing is a valuable tool for studying the bacterial genomes, the short read assembly (IonTorrent) could provide only limited information, particularly on the complete mobile genetic elements. However, long read assembly (MinION) could generate closed genome with enhanced information on the structural arrangement of mobile elements but with high error rate. Interestingly, the hybrid assembly approach involving short and long reads provided complete genome with acceptable error rate (< 10%). Thus the utilization of this novel approach in the present study helped to identify the complete plasmid sequence of pSS1653 with structural genetic information of AMR genes such as sulII, tetA, tetR, aph(6)-Id and aph(3’’)-Ib. Identification of AMR genes in mobile elements in this human-restricted enteric pathogen is a potential threat for dissemination to other gut pathogens. Further, limited information available on Shigella at genome level calls for a genomic surveillance studies to monitor the evolutionary trends and genome dynamics of emerging and existing resistance clones.