Introduction

Group B Streptococcus (GBS, Streptococcus agalactiae) was initially described as the cause of bovine mastitis [1]. In humans, GBS usually occurs in the genitourinary or gastrointestinal tract of adult males and females and is not pathogenic. In the birthing process, however, GBS from the genital tract of pregnant women can infect the lower gastrointestinal and upper respiratory tracts of newborns [2], resulting in neonatal sepsis and meningitis. Even with the appropriate antibiotic treatment, the mortality rates of late-onset GBS infection (age 7–90 days [3]) are high [4]. Additionally, during the past decade, GBS has been increasingly associated with skin and soft tissue infections and bacteremia in nonpregnant adults [5].

Capsular polysaccharides (CPS) are considered to play a vital role as virulence factors [6]. The sialic acid residue on the CPS accelerates the dissociation of the C3 convertase, thereby inhibiting the complement cascade and escaping detection by the host’s immune system. By identifying specific sequences in the CPS gene cluster, GBS can be subdivided into ten types [7,8,9]. The virulence of different CPS types varies: five major CPS types (Ia, Ib, II, III, and V) account for 96% and 88%, respectively, of the cases of invasive GBS infections in neonates and adults [10, 11]. Accordingly, similar to that found in Streptococcus pneumoniae, CPS is an essential target for the development of the GBS vaccine [12]. Currently, there is no vaccine on the market yet for GBS. The vaccines under development mainly target CPS Ia, Ib, II, III, and V [13,14,15].

It is well known that, in many bacterial pathogens, the use of vaccines can exert intense selective pressure upon bacterial evolution and change the epidemiological pattern. In S. pneumoniae, for example, the introduction of a 7-valent pneumococcal conjugate vaccine (PCV7) showed a decline of vaccine type (VT) and an increase of nonvaccine type (NVT) pneumococci in disease and nasopharyngeal carriage [16]. This alternation is achieved by capsule switching, which results actually from the recombination of essential CPS determinant genes [11, 17, 18]. GBS is naturally not transformable, and the rate of its capsular transformation shown by in vivo experiment is low [17]. Paradoxically, capsular switching has been frequently reported in GBS by a number of epidemiological studies, in particular in the clonal complex (CC) 1 and 17 [11, 18,19,20,21,22]. While most of these studies were conducted within the research subjects limited within a city or a country, the level of capsule switching on a global scale is still unclear. This information is of particular importance for the development of GBS vaccines because the propensity for capsule switching, if it really exists, would easily lead to “vaccine escape” once the monovalent vaccines start to use in the future.

The pilus-like structure in bacteria is another critical virulence factor and candidate target of vaccine development in GBS [23]. The structure consists of three components: a skeleton pilus protein and two accessory proteins that are involved in bacterial binding to host cells [24]. The GBS pilus is encoded by two loci in different regions of the genome, namely Pilus islands 1 and 2 (PI-1 and PI-2). The latter can be further divided into two distinct variants, PI-2a and PI-2b [25]. As a horizontal transfer element, typing based on pilus islands can be inconsistent with the true phylogeny of GBS [26]. Similar to CPS, knowledge on this on the global scale will benefit the development of GBS vaccines. Furthermore, when a certain GBS lineage possesses a phylogenetic inconsistent pilus type, it is unknown whether the lineage is more prone or more reluctant to a phylogenetic inconsistent CPS type, i.e., whether the recombination at the two loci will affect each other. It is also unknown whether these phylogenetic inconsistencies are associated with pathogenicity such as invasiveness.

The rapid development of WGS has made it cheaper for sequencing entire genomes. To date there have accumulated over 1000 GBS genomes in the public database, therefore providing good materials for addressing the above issues on a global scale. In the present study, we performed in silico prediction of the phylogeny, CPS, and pilus types for the GBS genomes in the NCBI Genbank database, and more importantly, investigated the relationship between the phylogenetic inconsistency and invasiveness of these isolates. It is hoped that this study helps better elucidate the relevance of recombination to GBS virulence and guide the development of GBS vaccines.

Methods

Bacterial isolates studied

All of the GBS genomes were downloaded from the NCBI Genbank database (downloaded on March 2, 2019; see Supplementary Table 1). The geographic and host information was extracted from the corresponding Biosample files. The pathogenicity of the human isolates was further divided as being invasive or colonizing based on the source information: those isolated from blood, cerebrospinal fluid, and placenta were considered invasive; those isolated from the genital tract or oral cavity were considered colonizing.

Phylogeny construction

The 7-gene multi-locus sequence typing (MLST) analysis (based on seven housekeeping genes) was performed using the online service, BacWGSTdb [27, 28]. MLST alleles and sequence types (STs) were assigned through the comparison of whole-genome data to the GBS MLST database (http://pubmlst.org/sagalactiae) [29]. The STs that differed by 1 or 2 alleles were further grouped to the same CC, following the nomenclature adopted by E. S. Björnsdóttir et al. [30].

Tree-independent hierarchical Bayesian clustering was performed to determine the population structure using hierBAPS (http://www.helsinki.fi/bsg/software/BAPS/). Briefly, the core genome multi-locus sequence typing (cgMLST) was performed using the BacWGSTdb service; the obtained allele matrix was used as input for the hierBAPS program; three levels of clustering were performed within the hierarchy, and a prior upper boundary of 20 clusters was established. A neighbor-joining tree based on the concatenation of whole-genome single-nucleotide polymorphisms (SNPs) were built by using the BacWGSTdb service; the strain GBS-M002 (accession CP013908; serotype VI; pilus type 1, 2a; MLST ST1; collection region: Taiwan) was used as the reference genome.

CPS typing and pilus typing

For CPS typing, an in-house BLAST database was built based on information in the literature [31, 32], which included unique CPS genes for CPS identification. A sequence homology of > 95% with an alignment length of > 95% for the target gene was used as the threshold for predicting gene presence/absence. The CPS genes of CPS type IV and IX were too similar to be distinguished at the genomic level; thus, the two CPS types were split according to the SNPs at positions 327, 551, 1018, 1123, 1140, 1368, 1627, and 1832 [9, 33].

For pilus typing, an in-house BLAST database was established based on the literature [25, 34], which included all unique genes for pilus identification. A sequence homology of > 95% with an alignment length > 95% for the target gene was used as the threshold to predict gene presence/absence.

The inconsistency of phylogeny with CPS or pilus typing was defined as follows. Assume that CPS/pilus type a was most frequent within the BAPS cluster A whereas type b was the secondary one; meanwhile, majority of isolates of CPS/pilus type b belonged to another BAPS cluster B. Then we defined that the isolates of the BAPS cluster A carrying the CPS/pilus type b showed the inconsistency.

Statistical analysis

The data were analyzed for potential associations between such variables as CPS types and invasiveness using the SPSS program (Version 21.0 for Windows), utilizing the chi-square test of independence. The P value was further adjusted by FDR method when multiple comparisons were made. The significant level was set at P < 0.05.

Results

Phylogeny of GBS worldwide

All of the 1016 GBS genomes were downloaded from the NCBI Genbank database (Supplementary Table 1). These analyzed isolates were collected from 28 countries in Asia, Europe, South America, Africa, Australia, and North America (Supplementary Table 2). The seven gene-based MLST was performed to study the species’ phylogeny. The predominant CCs of the isolates were identified as CC1, CC61, CC23, CC17, and CC19, accounting for 20.2% (n = 205), 16.7% (n = 170), 15.3% (n = 156), 11.8% (n = 120), and 10.8% (n = 110) of the total isolates, respectively. Since housekeeping genes used for MLST may also undergo recombination that distorts the real phylogenetic relationships, we extracted the sequences of the core genomes and used a Bayesian clustering method (BAPS) for phylogeny reconstruction (Fig. 1). Thirteen BAPS clusters were identified based on the allele matrix of core genome MLST (cgMLST), including the most common cluster 3 (142 isolates, 14.0%), cluster 4 (142, 14.0%), cluster 2 (132, 13.0%), cluster 12 (120, 11.8%), cluster 5 (110, 10.8%), and cluster 8 (106, 10.4%).

Fig. 1
figure 1

Phylogeny and its relationship with BAPS, MLST, CPS type, and pilus type. The inner neighbor-joining tree was built based on the whole-genome SNPs. The rings from inner to outer represent BAPS cluster, clonal complex (CC) by 7-gene MLST, CPS type, and pilus type, respectively

Part of the BAPS clusters matched CCs perfectly, such as the BAPS cluster 3 corresponding to CC12, cluster 12 to CC17, and cluster 5 to CC19, whereas part of CCs seemed to exhibit a greater diversity as they comprised more than one BAPS cluster (Table 1; Fig. 1). For example, CC1 comprised BAPS clusters 4 and 11, CC23 comprised clusters 6, 8, and 10, CC61 comprised clusters 1 and 2. Overall, the two results were not contradictory to each other, showing the great compatibility of the two typing systems.

Table 1 The clonal complex (CC), CPS, and pilus composition in each BAPS cluster

CPS typing of global GBS isolates

The CPS typing results demonstrated that CPS type II (250 isolates, 24.6%) was the most prevalent, followed by III (238, 23.4%), Ia (151, 14.9%), V (144, 14.2%), Ib (119, 11.7%), and IV (86, 8.5%). For a few BAPS clusters, their isolates possessed homogeneous CPS forms (Table 1; Fig. 1). For example, all isolates of BAPS cluster 6 possessed CPS type IV; cluster 7 possessed CPS type II; and cluster 13 possessed CPS type Ib. For the other BAPS clusters, however, their isolates showed miscellaneous CPS forms, with one CPS type appearing to be dominant and the secondary types likely to be phylogenetically inconsistent (Table 1). Statistically, the BAPS clusters 3, 4, and 5 were prone to such inconsistency, whereas BAPS clusters 2, 8, 9, 12, and 13 were reluctant to such inconsistency (Table 1).

From the view of CPS types, many CPS types had their infrequent BAPS clusters. While CPS type II was mostly found in BAPS clusters 1, 2, and 7, 27.6% (69/250) of its isolates appeared in BAPS clusters 3, 4, 5, 10, and 11. Similarly, 18.6% (16/86) of CPS type IV, 18.1% (26/144) of CPS type V, 14.3% (34/238) of CPS type III, and 2.0% (3/151) of CPS type Ia appeared in their infrequent BAPS clusters. Consequently, CPS types Ib and Ia were seldom involved in phylogenetical inconsistency, whereas CPS type II was most prone to phylogenetical inconsistency (Table 2).

Table 2 Percentage of isolates with phylogenetic inconsistency in each CPS and pilus type

Identification of Pilus Islands in GBS

Of the 1016 GBS isolates, 463 (45.8%) were found to carry one pilus island and 535 (52.7%) to carry two. PI-1, PI-2a, and PI-2b appeared in 540 (53.1%), 581 (57.2%), and 416 (40.9%) isolates. The most common form of pilus island in these isolates involved the simultaneous carriage of PI-1 + PI-2a (398, 39.2%). To further elaborate, all isolates of the BAPS cluster 2 carried PI-2b only, while most isolates of cluster 5 carried PI-1 + PI-2a (105/110, 95.5%), and most of cluster 8 carried PI-2a (105/106, 99.1%) (Fig. 1).

Since both the CPS typing and pilus typing results were overall consistent with BAPS clustering, majority of CPS type corresponded to a specific pilus type. The isolates carrying PI-1 + PI-2b were mainly of CPS type III (100/137, 73.0%), and the isolates of the type PI-2a were mainly of CPS types Ia and II (147/183, 80.3%). Conversely, the isolates of CPS type Ib mainly carried the PI-1 + PI-2a or PI-2b pilus (99/119, 83.2%), the isolates of CPS type III mainly carried the PI-1 + PI-2a or PI-1 + PI-2b pilus (203/238, 85.3%), and the isolates of CPS type V mainly carried the PI-1 + PI-2a pilus (125/144, 86.8%).

However, inconsistencies still existed between the pilus typing and the BAPS clustering. Taking the BAPS cluster as the unit, clusters 3 and 11 were prone to such inconsistency, whereas clusters 2, 8, 9, and 13 were reluctant to such inconsistency (Table 1). Taking the pilus type as the unit, isolates of the PI-1 + PI-2b and PI-2a types were prone to the phylogenetic inconsistency than the others, whereas isolates of the PI-1 + PI-2a type were reluctant to the inconsistency (Table 2).

Invasiveness of GBS

Of the total of 1016 isolates, 489 (48.1%) were of human sources. In detail, 187 isolates were derived from human blood, cerebrospinal fluid, and placenta and thus considered invasive. The remaining 302 isolates, mostly from the genital tract (125/302, 41.4%), were considered colonizing isolates. The BAPS clusters 4, 6, 7, and 12 appeared to be more virulent than the others, with > 40% of their isolates being invasive (Table 1).

Regarding the relationship with the CPS typing results, the invasive isolates were mainly distributed in CPS types III (69/187, 36.9%) and V (40/187, 21.4%) (Table 3). Although the numbers of invasiveness in these two CPS types was relatively high, the P value had not yet reached a significant level (P > 0.05). In contrast, CPS type Ib was more likely to appear in colonizing isolates than the other types (P < 0.05).

Table 3 The association between invasiveness and CPS and pilus typing

Regarding the distribution of the pilus types, the PI-2b type was more likely to appear in invasive isolates than other pilus types (P < 0.05; Table 3). In colonizing isolates, the PI-1 + PI-2a type was more common than the other types (P < 0.05; Table 3).

Relationships between recombination and invasiveness

Next, we analyzed whether the phylogenetically inconsistent isolates were concentrated in invasive isolates. For CPS type III isolates, 10 out of the 69 invasive isolates (14.5%) showed phylogenetic inconsistency; this proportion decreased to 2.1% (2/94) in the colonizing isolates (Fig. 2a). The situation in CPS type V was the converse: phylogenetic inconsistency in CPS type V isolates was more common in colonizing isolates (14/57, 24.6%) than invasive ones (2/40, 5.0%, P < 0.05; Fig. 2a).

Fig. 2
figure 2

The relationship between phylogenetic inconsistency and invasiveness. For each CPS type (in panel a) and pilus type (in panel b), the isolates were further divided into invasive isolates and colonizing isolates. The y-axis indicates the proportion of phylogenetically inconsistent isolates (in blue) and phylogenetically consistent isolates (in orange). I, invasive isolates; C, colonizing isolates

For each pilus type, we did not find an uneven recombination distribution between invasive and colonizing isolates.

Discussion

This study determines the distribution of CPS types and pilus types among global GBS isolates and, in particular, investigates whether the recombination events occur within CPSs and pilus islands. The motivation for this study is that capsular switching through recombination has been observed during the last decade in S. pneumoniae as the primary approach by which the pathogen eludes vaccines. As CPS and pilus typing has been frequently reported to show phylogenetic inconsistency in GBS, this pathogen is very likely to adopt the same strategy to S. pneumoniae to escape the host’s immune system when vaccine starts to use in the future. It is therefore imperative to profile the global recombination pattern in these loci before GBS vaccines go into commercial use.

As a genetic event for laterally exchanging DNA, recombination is universally present in bacteria and results in an incongruent phylogeny between local recombinant fragments and adjacent regions. In the past decades, MLST has been considered a gold standard typing tool for characterizing bacterial isolates from the sequences of internal fragments of (usually) seven housekeeping genes. Nevertheless, recombination in the housekeeping genes has been reported in a number of bacterial pathogens, including Escherichia coli, Salmonella enterica, and Staphylococcus aureus [35,36,37]. The MLST-based phylogeny is not entirely free from the noise created by recombination, and scientists have turned to WGS to minimize this interference. By comparing the results of the genome-wide BAPS and the 7-gene MLST, we found that the two results matched well with each other, suggesting that the seven genes used for MLST in GBS had undergone little recombination and were good candidates for within-species typing. A combination of both BAPS clustering and MLST is expected to better accurately assess the bacterial phylogeny, based on which we further predicted the strains’ phylogenetic consistency with the CPS and pilus typing results.

Epidemiologically, CPS types have been shown to vary between years and nations. Even more importantly, CPS types may be linked with human diseases such as early-onset (age 0–6 days) sepsis and meningitis [38, 39]. The varying pathogenic property is likely to be attributed to a specific CPS, but may also likely be due to a combination of virulence factors specific to the particular genetic lineage. Here, our analysis of global isolates revealed that the human invasive isolates were mainly concentrated in CPS types III and V, whereas CPS type Ib is more likely to appear in colonizing isolates than other types. Several studies based on other techniques than WGS have also found that CPS type V exhibits high invasiveness; this includes research performed in Shanghai, China [5]; England and Wales [40]; Portugal [41]; and Alberta, Canada [42]. Fortunately, the multivalent vaccines employed in the present trial currently have already covered CPS type V as well as Ia, Ib, II, III, and IV [13,14,15].

Despite the overall pronounced correlation between CPS type and BAPS clustering, a nonnegligible proportion of GBS isolates did show the signal of recombination in their CPS clusters. Among the main CPS types, CPS type II was more prone to phylogenetic inconsistency, whereas CPS type Ib and Ia had the reverse trend. Concerning the relevance with invasiveness, the colonizing isolates of CPS type V are more prone to phylogenetic inconsistency than the invasive ones. These findings are consistent with previous reports that the colonizing isolates have a greater tendency for capsular switching [43]. A plausible explanation is that persistent colonization facilitates the lateral genetic transfer of GBS isolates, thereby promoting capsular switching. Paradoxically, for CPS type III, phylogenetic inconsistency is less frequent in colonizing than in invasive isolates. We attribute this difference to an opposite driving force arising from the immunogenic pressure induced by host immunity [44]: bacteria resort to CPS switching to evade host immunity following the host’s generation of antibodies against the CPS. The eventual extent of phylogenetic inconsistency may depend on the interaction of the above two opposing forces and the fitness of the recombinant strain.

The pilus-like structure plays a role in GBS colonization and pathogenicity, making it another potential target for the vaccine development. We found that PI-1 + PI-2a accounts for the highest proportion of pili. This combination of pilus types is more likely to occur in colonizing isolates but is less prone to phylogenetic inconsistency. This situation is in contrast with that for the CPS, possibly because this pilus combination may have adapted to a higher degree to the colonization environment. In contrast, the isolates carrying PI-2a or PI-1 + PI-2b have been frequently involved in phylogenetic inconsistency. According to previous reports, PI-2b is associated with invasive infections in neonates, and PI-2a with invasive diseases in adults [26, 31, 32, 45]. It is unclear whether this phenomenon results from the different immune statuses of children and adults. Here, we have only observed the association between PI-2b and invasiveness, probably due to the sampling bias of the public database towards child patients. Meanwhile, we also found that the PI-2b isolates did not enhance invasiveness by acquiring PI-2b in a recombinant manner; instead, PI-2b was inherently present in the isolates. It is possible that PI-2b itself benefits the evasion of the host immune system, or that the genetic background of these lineages carries other genes beneficial for invasion.

When the phylogenetic inconsistencies of the CPS and pilus typing results were taken into consideration together, we found that the BPAS clusters 4 and 5 had the CPS locus prone to the inconsistency whereas the BAPS cluster 11 had the pilus locus prone to the inconsistency. This suggests that for these lineages, recombination may occur independently in the two loci. However, the BAPS cluster 3 had its both loci prone to phylogenetic inconsistency, indicating that the double switch at the loci may better contribute to the cluster’s fitness instead of a single switch does.

The strength of this study was the global collection of GBS genomes used for analysis, which meanwhile brings a few limitations. First, many isolates lack their detailed clinical information in the public database, therefore making many analyses unavailable such as the categorization between early-onset and late-onset diseases caused by GBS. Second, although the public database collects bacterial genomes worldwide, many genomes were released from a few countries, majority of which were from Asia, Europe, and North America. Consequently, it is unknown whether the conclusions drawn in this study applies to the countries not represented in the public database.

Conclusions

This study characterizes the global distribution of GBS’s CPS type and pilus type as well as its relevance to invasive disease. While we found little recombination occurring in MLST genes, the recombination signals were detected in both CPS and pilus genes. This suggests that GBS may rely on recombination upon specific virulence genes to better defend the host’s immune system. The present findings are beneficial for the current development of GBS vaccines. The CPS types and pilus types that are frequently involved in recombination need to be covered as the primary vaccine targets. Furthermore, continuing surveillance is required to reflect whether capsule switching would occur when the GBS vaccines start to use in the future, which can be revealed by comparison of current and future CPS distributions.