Pyrosequencing™ analysis of the gyrB gene to differentiate bacteria responsible for diarrheal diseases
- First Online:
- Cite this article as:
- Hou, X., Cao, Q., Jia, H. et al. Eur J Clin Microbiol Infect Dis (2008) 27: 587. doi:10.1007/s10096-008-0477-7
- 134 Views
Pathogens causing acute diarrhea include a large variety of species from Enterobacteriaceae and Vibrionaceae. A method based on pyrosequencing™ was used here to differentiate bacteria commonly associated with diarrhea in China; the method is targeted to a partial amplicon of the gyrB gene, which encodes the B subunit of DNA gyrase. Twenty-eight specific polymorphic positions were identified from sequence alignment of a large sequence dataset and targeted using 17 sequencing primers. Of 95 isolates tested, belonging to 13 species within 7 genera, most could be identified to the species level; O157 type could be differentiated from other E. coli types; Salmonella enterica subsp. enterica could be identified at the serotype level; the genus Shigella, except for S. boydii and S. dysenteriae, could also be identified. All these isolates were also subjected to conventional sequencing of a relatively long (~1.2 kb) region of gyrB DNA; these results confirmed those with pyrosequencing™. Twenty-two fecal samples were surveyed, the results of which were concordant with culture-based bacterial identification, and the pathogen detection limit with simulated stool specimens was 104 CFU/ml. DNA from different pathogens was also mixed to simulate a case of multibacterial infection, and the generated signals correlated well with the mix ratio. In summary, the gyrB-based pyrosequencing™ approach proved to have significant reliability and discriminatory power for enteropathogenic bacterial identification and provided a fast and effective method for clinical diagnosis.
Acute diarrhea, with a variety of clinical symptoms ranging in severity from moderate or self-limiting to life-threatening, is ordinarily caused by any one of a number of pathogenic microorganisms such as Salmonella enterica, Shigella, pathogenic strains of Escherichia coli, and Vibrionaceae spp. [1, 2]. It remains a serious public health problem in most developing countries. Within different geographic regions, the prevailing enteropathogens may be different, presumably as a result of varied environment and living style. In China, Shigella spp. is the most common diarrheal bacteria [3–5], responsible for over 400,000 shigellosis cases in 2006 and ranking in the top five infectious diseases in China. The remainder of causative agents, in order of prevalence, are E. coli, Vibrio parahaemolyticus, Salmonella enterica, Vibrio cholerae, Aeromonas spp., and Yersinia enterocolitica [3, 6]. Despite prophylactic measures and improved sanitation, the morbidity rate of diarrheal diseases remains very high, especially in children and travelers. Moreover, these diseases often appear as local outbreaks as in the case of epidemic cholera occurring in some regions of southern China in 2005. In this situation, rapid identification of causative bacteria with a high degree of specificity and sensitivity is essential to outbreak surveillance and investigation and is also helpful in the implementation of proper clinical therapies.
Generally there are two different approaches to pathogen identification: phenotypic methods have been used traditionally, in which various phenotypic traits such as colonial morphology or biochemical reactions are observed for the characterization of organisms. An alternative method is genotypic identification, in which specific bacterial gene sequences are searched for by various means . Examples of the genotypic approach are restriction fragment length polymorphism (RFLP), multilocus enzyme electrophoresis (MLEE), or probe hybridization. All of the genotyping means are based essentially on sequence differences; thus direct sequencing is the most basic method among these.
Pyrosequencing™ technology (Biotage, Uppsala, Sweden) is a real-time DNA sequencing method, based upon the release of pyrophosphate when any nucleotide triphosphate is successfully added to the growing polynucleotide chain. While read lengths (less than 50 bp) are still much shorter than those achieved with Sanger chain termination chemistry, the hardware has undergone major technical developments [8, 9], which makes it particularly advantageous for the fast identification of short DNA sequences and detection of single nucleotide polymorphisms (SNPs) .
The 16S rRNA gene, a polymorphic bacterial gene, is commonly exploited for identification of bacterial strains and species. Jordan et al. have demonstrated that pyrosequencing of an informative stretch of 15 bases in the 16S rRNA gene is an accurate means for differentiating between bacteria responsible for neonatal sepsis . However, the ribosomal genes are found to be quite highly conserved, and thus uninformative in closely related species . When comparison of such strains is of interest, protein coding genes with a higher variability, such as ATP synthase (atpD), RNA polymerase (rpoD), translation initiation factor (infB), and heat shock proteins (groEL) , are usually more useful.
The gyrB gene, encoding the B subunit of DNA gyrase (topoisomerase type II), is a useful alternative marker to 16S rRNA [12–14]. DNA gyrase regulates supercoiling of double-stranded DNA, which is necessary for DNA replication. The enzyme is ubiquitous among bacterial species , and its gene has been shown to be an excellent target for differentiating between bacterial species of the families Enterobacteriaceae [15, 16] and Vibrionaceae [17, 18]. In our previous study, a 1.2-kb-long gyrB sequence analysis of bacteria involved in diarrhea in China proved this gene to be suitable for phylogenetic analysis (unpublished data). In the present study, differentiation of the causative agents of diarrhea was achieved by pyrosequencing polymerase chain reaction (PCR) amplicons from species- (or serotype-) specific, polymorphic positions of the gyrB gene, using 17 sequencing primers.
Materials and methods
Primer design for pyrosequencing
A comprehensive dataset of gyrB sequences of bacteria associated with diarrhea in China was constructed from entries selected from GenBank (http://www.ncbi.nlm.nih.gov), the Identification and Classification of Bacteria (ICB) database (http://seasquirt.mbio.co.jp/icb/) and from in-house sequences generated de novo for a previous phylogenetic analysis. This dataset served to guide primer design. Altogether 226 sequences of E. coli and Shigella spp., 78 of Aeromonas spp., 262 of Salmonella, and 29 of other species investigated here were included. A complete list of accession numbers is available on request.
Polymorphic site patterns within the analyzed gyrB regions as generated by pyrosequencing
Sequence to analyze by pyrosequencingb
Ec, Sh, Sa, Ye, Vc, Vp
T A: Vp
G A: Vc
G G: Ps
C C: Ae
T C C: Ah
G G: Ac
Ec, Sh, Sa
GC: Ec; Sh
Ec, Sh, Sa
A: SaC; SaTm; SaPb; SaT
G: SaE; SaPa
Ec, Sh, Sa
C: Ec; Sh
C: SaTm; SaPa
T: SaE; SaTm
Ec, Sh, Sa, Ae, Ps
T T: Sf
Ec, Sh, Sa
Ec, Sh, Sa
G: some Sb; Sd
A T: Ye
Primers used in two-round PCR amplification
Short sequences(~10 bps long), encompassing these specific polymorphism sites and also their flanking regions, were regarded as targets for pyrosequencing and the corresponding sequencing primers were designed with the aid of Biotage’s primer design software; among the primer candidates, the ones with similar melting temperature (Tm) values were selected.
All of the sequencing primers generated had their 3′ end located 0–3 bp ahead of their targeted positions, and, if any positions included were not stringently identical among all targeted species (or serotypes), degenerate bases were employed at these positions (Table 1).
Bacterial strains and preparation of DNA templates
Summary of pyrosequencing results of 95 pure cultures analyzed in this study
Species or serotype
Salmonella enterica subsp. enterica
S. paratyphi A
S. paratyphi B
Sample preparation of pathogens in a background of normal fecal flora
Bacterial mixtures isolated from a healthy adult fecal sample were stored at -75°C until needed; 10-μl aliquots of the suspension of normal fecal flora were individually inoculated into each test tube of blood broth. To these same broths, 10-μl aliquots of each culture of the preserved Shigella flexneri, S. sonnei, and Salmonella enteritidis strains were added separately. The mixtures were cultured overnight at 37°C. A portion (1.5 ml) of each culture was used for extraction of genomic DNA. These three species were chosen because they are prevalent agents of diarrhea in China.
DNA template preparation from fecal specimens
Fecal samples obtained from 2 healthy adult volunteers and 20 acute diarrhea patients were collected from the First Affiliated Hospital, Zhejiang University and Hangzhou First People’s Hospital from July to October 2007. DNA was extracted with a QIAamp Mini Stool Kit (QIAGEN GmbH, Hilden, Germany).
Nested PCR amplification
Two sets of primers were used to carry out nested PCR amplifications: gyrB broad-range primers UP-1/2r [14, 19] for the first round of amplification and All-f/r pairs specified in Table 2 for the second round; the second round generated fragments short enough for pyrosequencing. The inner primer (All-r) was biotinylated for purposes of capturing a single-strand DNA as pyrosequencing templates.
The two rounds of PCR amplification were carried out in a final reaction volume of 50 μl, containing 1.25 U ExTaq DNA polymerase, 200 μM of each deoxynucleotide triphosphate, 5 μl 10 × PCR buffer, and 16 pmol of each primer [all from Takara Biotechnology (Dalian) Ltd., China]; 1 μl of DNA preparation was used as template in a first round amplification and 1 μl (diluted 1:1000) of the resultant amplicon solution was used as template in the second round.
The first round of PCR amplification was performed with 35 cycles at 96°C for 1 min, 62°C for 1 min, and 72°C for 2 min, followed by a final incubation at 72°C for 7 min. The second PCR amplification was performed using a “touchdown” (TD) program with a spanned annealing temperature range (65°C descending to 45°C, two cycles per 1°C-step), with the final 20 cycles at 54°C. Each cycle was composed of denaturation at 95°C for 30 s, annealing at the corresponding TD temperature for 30 s, and extension at 72°C for 45 s. In situations where degenerate primers are used, TD PCR inherently favors amplification of the desired template to the exclusion of artifactual amplicons. The PCR fragments were examined by standard agarose gel electrophoresis (1.5%) with ethidium bromide staining.
Sanger DNA sequencing
The outer, 1.2-kb amplicons of all strains were fully sequenced on an ABI 3730 DNA Analyzer (Applied Biosystems) by Invitrogen Biotechnology Company (Carlsbad, CA, USA). Before sequencing, the amplicon was purified using the Gel Extraction Mini Kit (Watson Biotechnologies, China). The sequencing primers  were UP1S (5′-GAAGTCATCATGACCGTTCTGCA-3′) and UP2rS (5′-AGCAGGGTACGGATGTGCGAGCC-3′).
The pyrosequencing assay was performed using a “Pyro Gold Reagents” kit and the PyroMark ID analyzer (Biotage, Uppsala, Sweden) according to the manufacturer’s instructions. The biotinylated PCR product (40 μl) was captured using streptavidin-coated sepharose beads (Amersham Biosciences, Little Chalfont, UK), and the unlabeled forward strand was denatured and removed using the “Vacuum Prep Tool” (Biotage, Uppsala, Sweden) according to the manufacturer’s instructions. After washing, the resulting single-strand DNA was transferred to a 96-well microtiter plate and used as a template for the pyrosequencing assay with 1.3 pmol sequencing primer per reaction. Seventeen separate reactions (i.e., one reaction per sequencing primer) were performed with each sample.
In the sequencing process, nucleotides were dispensed by the “dispensation order,” automatically generated according to the “sequence to analyze” settings in the PyroMark ID software (Table 1), which indicate that only selected bases were added to interrogate the sequence. This sequence segment is only possessed by a limited group of species to be identified (see Table 1, column 2) with degenerated positions to be elucidated. Any species not possessing this sequence will only generate a null signal in pyrosequencing, which ensured the specificity of the assay to some extent (see “Discussion”).
Detection limit of the pyrosequencing assay
The simulated (spiked) stool specimens for the assessment of detection limit of the assay were prepared as follows. A serial dilution of S. sonnei culture in the range of 107 to 101 CFU/ml was prepared and mixed with 0.2 g of stool specimens from a healthy adult volunteer. DNA was extracted with QIAamp Mini Stool Kit (QIAGEN GmbH, Hilden, Germany) for two-round PCR amplification and pyrosequencing.
Detection of simulated multibacterial infection
Since it is difficult to collect a fecal sample with typical multibacterial infection, a simulation was performed by mixing PCR amplicons of S. typhimurium and S. paratyphi A. The mixture for simulating complex infection was prepared as follows. Biotinylated PCR products of S. typhimurium and S. paratyphi A isolate were respectively quantified using a NanoDrop ND-1000 Spectrophotometer (NanoDrop Technologies, Inc., Wilmington, DE, USA) and were mixed 1:1 for pyrosequencing using the primer S1047. To analyze the pyrosequencing results, the “AQ mode” (allele quantification) of the software was used.
Assessment of PCR amplification
The effectiveness of the primers was assessed with DNA extracts from 95 stored bacterial isolates representing 13 related species within 7 genera of Enterobacteriaceae and Vibrionaceae. An electrophoresis band at ~1.2 kb was produced by the first PCR reaction, and a ~380-bp fragment was generated in the second step, using the TD PCR program.
Pyrosequencing of DNA amplicons from bacterial isolates
A total of 95 isolates (see Table 3) were analyzed by pyrosequencing for 28 specific polymorphisms downstream of the 17 sequencing primers. Table 1 illustrates the targeted sequences and genotypes expected to be generated by each primer. Following 17, separately primed sequence reactions, every isolate could be assigned an unambiguous species or serotype designation; these were in accordance with biochemical profiles for all isolates at species level (or serotype level for S. enterica subsp. enterica), except for assigning three S. boydii and two S. dysenteriae strains to one group. In no case were there contradictions between redundant primer sets. This was the case, for example, with primers S1072 and S1047 for S. enterica subsp. enterica, as well as S924 and S1041 for S. sonnei. This kind of redundancy also serves as a quality control, making full use of sequence information and further validating the generated results.
Some species are differentiated by unique combinations of characters at several sites rather than one single variation. For example, results in the case of sequencing primer S1029, taken in combination with those of G(1029) and A(1032), were found to be species specific for V. cholerae. Another example of this is S. flexneri, which had a T at both (1098) and (1101) positions.
In comparison to differentiation at species level, the differentiation of Salmonella serotypes was more complicated and required a two-step decision tree. Once a strain was identified as the S. enterica subsp. enterica, on the basis of results with primers S1072 and S1047, four other primers were required for further classification: S1002 for S. enteritidis, S981 for S. typhi, S996 for S. paratyphi B, S840 for S. choleraesuis and S. paratyphi A, and S1002 with S840 for S. typhimurium.
Even so, there are some limitations to the current assay in discriminating Salmonella serotypes. Employing the two-step approach described above, theoretically four serotypes, i.e., choleraesuis, paratyphi B, typhi, and typhimurium, could not be easily differentiated from other, uncommon serotypes, which are rarely isolated from fecal specimens in China, such as agona, mbandaka, and kedougou, etc. (cf. Supplementary Table 2). Considering the infrequency with which these serotypes are encountered in China, these limitations do not seriously compromise the validity of the current assay.
As E. coli is a constituent of normal intestinal flora, and because it is difficult to identify different types of E. coli causing diarrhea interrogating only housekeeping genes, most isolates classed as E. coli were identified just to the species level by excluding the possibility of Shigella in the E. coli/Shigella group. Primers S1072 and S1002 group E. coli and Shigella, but primers S1095, S924, S1041, and S975 permit logical exclusion of Shigella (Table 1). The exceptions were two specific groups with clinical significance—a specific polymorphism A(1002) for discriminating the extraintestinal pathogenic E. coli (ExPEC) from other E. coli (possibly enteropathogenic) and a unique T(1069) marker for O157 strains. The sequencing primer for ExPEC doubles as a negative control for intestinal E. coli. In this study, pyrosequencing through position 1002 of ExPEC strain ATCC25922 (nonenteropathogenic strain) revealed an adenine base, as was expected. In addition, three O157 strains were also correctly differentiated by pyrosequencing.
Comparison of pyrosequencing results with Sanger sequencing
These variations determined by pyrosequencing for each of the tested isolates were identical to those obtained by Sanger sequencing. The identity of the pyrosequences was confirmed by BLASTing against the 1.2-kb gyrB gene clusters.
Pyrosequencing identification of Shigella spp. and S. enteritidis in a background of fecal flora
In a mixture with nonpathogens isolated from healthy adults’ stool, both S. flexneri and S. sonnei isolates were clearly identified by the combined results from primers S1095, S924, and S1041. S. enteritidis was also differentiated by the results from S1072, S1047, and S1002. Peak intensities and signal-to-noise ratios generated for these specific signatures were comparable, whether in the case of DNA extracted from pure culture or from fecal floral background, and could be unambiguously identified.
Pyrosequencing analysis of clinical fecal specimens
Sensitivity of detection with simulated stool specimens
Pyrosequencing results with simulated multibacterial infection
In this study, we applied pyrosequencing using 17 sequencing primers to detect 28 variations in the gyrB gene in order to differentiate between 13 species responsible for diarrheal diseases. The 95 isolates investigated in this study were identified to the species or serotype level, except for three S. boydii and two S. dysenteriae strains, which have a common polymorphic position G(975) and no other polymorphisms were available to differentiate them. This result was similar to the resolution obtained using the full sequence data of 1.2-kb gyrB amplicons, where one S. dysenteriae strain was revealed to have completely identical sequences with three S. boydii and 99.8% similarity to another S. dysenteriae strain.
In the first pass of identification of pure cultures, discordant results between gyrB genotypes and biochemical profiles were observed in the following two cases: A. caviae vs A. hydrophila (one isolation) and V. parahaemolyticus vs A. hydrophila (one isolation). However, these discrepancies were rectified after a second test with the Vitek system, which showed the initial biochemical conclusion to be erroneous. This error demonstrates the reliability of the genotypic assay, especially with closely related enteropathogenic species of Vibrionaceae, which sometimes tend to be confused by phenotyping.
S. flexneri, S. sonnei, and S. enteritidis were clearly identified in a background of stool-isolated nonpathogens using pyrosequencing, which suggested that it would be possible to discriminate these three prevalent species associated with diarrhea by applying our PCR/pyrosequencing method directly to fecal samples. In a further investigation, therefore, we applied the method to fecal samples from diarrhea patients. Here, too, the pyrosequencing results were consistent with culture identification: 17 samples proved the infection to be due to V. parahaemolyticus, which is the major causative agent of food poisoning and acute diarrhea in Chinese coastal regions in summer , 2 due to S. sonnei, and 1 induced by A. hydrophila. Furthermore, samples spiked with graded amounts of pure cultures of S. sonnei indicated a limit of detection of about 104 CFU/ml.
Fundamentally, pyrosequencing is just an alternative sequencing technique, comparable to Sanger dideoxy sequencing, but it has some advantages which make it particularly suitable for clinical diagnostic purposes. The clinical purpose is not to generate de novo sequence data, but to use preexisting sequence databases to define and focus on short regions of sequence harboring SNPs which can discriminate species and strains which overall are closely similar. Thus the short read length of pyrosequencing becomes an advantage, allowing one to skip over long stretches of identical, and thus uninformative, sequences. Moreover, these short reads are unambiguous from beginning to end, unlike Sanger results, where initial and final sections of a read tend to be noisy and are often discarded. Our application of Sanger sequencing in this study was not to augment results obtainable with pyrosequencing, but to corroborate them.
Another crucial advantage of pyrosequencing for the clinical laboratory over Sanger sequencing, and even over hybridization-based SNP detection, is its ability to handle, and quantify, mixed cultures. Even for hybridization-based detection assay, multi-infection is too often a problem: since probes target a group of closely related species, or one certain species is targeted for more than one probe, making it difficult to get an accurate judgment, for which species actually contribute to the signal detected was not known. With the ability of allele frequency quantification, however, pyrosequencing not only detected the existence of S. typhimurium together with S. paratyphi A in our simulated case, but also gave a reasonable estimate for the composition of their DNA mixture. The reason for the superior treatment of mixed bases in programs is that the signals from mixed bases at the same locus are separated in time, so that each base is quantified against its baseline.
Unlike a number of molecular identification projects using pyrosequencing [21–23] which have focused on a small group of very closely related organisms (e.g., typing within one genus or species), our study included a broad range of various genera, species, subspecies, and serotypes. It is this range that necessitated use of a number of different sequencing primers (17) to probe specific signatures, widely scattered among the sequences of these bacteria. These sequencing primers are ordinary, synthetic oligonucleotides without any extra modifications (such as 5′ primary amino groups and C6 spacers), thus helping to lower the cost of the assay. The method is also very flexible in scope: to detect a greater number of pathogens with known signatures, new primers corresponding to these signatures could be added; or vice versa, the set of 17 primers could be reduced in number when only a smaller number of bacteria were of interest.
Another technical problem accompanying a wide detection spectrum is that hardly any one universal PCR or sequencing primer can be designed to fit distantly related species without introducing excessive degeneracy. With increased numbers of degenerate bases, sequence data quality tends to suffer. Therefore, in this study, we used the criteria that the primers have no more than two degenerate bases and that no such bases occur within the last five bases at the 3’ end. In fact, most primers following the above rules worked well, generating sharp, unambiguous signals approximately 15–20 units higher than the base line.
The specificity of the assay as developed here does not reside solely in the pyrosequencing primers. There is a redundancy of selectivity at three levels: (1) the PCR primers ensure that only targeted species of Enterobacteriaceae and Vibrionaceae are amplified. (2) Most sequencing primers only hybridize to a small group of closely related species to be differentiated (see Table 1), rather than all investigated bacteria in this study (e.g., the primer S840 targeting S. choleraesuis-specific signature will only hybridize to S. enterica subsp. enterica). (3) The “sequence to analyze” feature in SNP runs on the “PyroMark” software acts as the template for nucleotide incorporation, so that positive sequence signals are given only by a small group of species which share the expected sequence in the vicinity of a particular polymorphism (e.g., the “sequence to analyze” for primer S1072 is only shared by Enterobacteriaceae species investigated here except for Yersinia enterocolitica). Any species not conforming to these three levels of selectivity could hardly pass through the whole process of amplification, priming, and sequencing. This strongly mitigates against false positives in the assay.
Locating discriminative sequence variants is, of course, of key importance in sequence-based identification methods. Several probes described by Kakinuma , for Shigella and E. coli microarray detection, were initially incorporated into our analyses, but failed to identify most local isolates correctly and were subsequently eliminated. Further sequencing validated this finding, indicating those probes were only suitable for a limited number of strains of their corresponding species. This emphasizes the importance of considering as large a sequence dataset as possible for use in primer design. The dataset we inspected contained almost all currently available gyrB sequences which met certain quality criteria (long enough, no ambiguous bases, etc.), including both standard and clinical isolates. This sequence collection seemed to have comprehended all of the possible genotypes of our regional isolates.
Pyrosequencing is well suited to combine the coarse genotypic resolution provided by relatively conserved sequences, with the fine resolution revealed in those more variable regions by robust phylogenetic analysis of closely related species. It is reasonably rapid: the results described here are available within 8 h, from DNA extraction of isolates or fecal samples to pyrosequencing and postsequencing analysis. It is cost-effective: cost analysis of the current assay comes to about US $25 for each fecal sample. We believe the system is capable of rapidly and accurately tracking bacterial agents in outbreaks of diarrheal disease.
This work was supported by grants 2003C13015, 021103128 from the Science and Technology Department of Zhejiang Province, China. We appreciate Barbara J. Chang for critical discussion and Robert Wohlhueter for his help in preparing the manuscript.