Introduction

Corynebacterium diphtheriae is the causative agent for diphtheria, an acute, communicable disease among children, which can be fatal. The disease is transmitted through contact with respiratory droplets from infected individuals. During the pre-immunization era, diphtheria toxin (tox) was the major cause of mortality in the infected individuals. The disease showed a tremendous reduction after the introduction of toxoid vaccine (PW8) in the twentieth century and currently remaining at less than 8000 reported cases worldwide in year 2016 [1]. The clinical presentation is generally characterized by the formation of an inflammatory pseudomembrane at the upper respiratory tract [2]. The interaction between the bacteria and its infecting phage plays an important role in the bacterial toxin acquisition. DtxR, an iron-dependent toxin repressor produced by C. diphtheriae regulates the expression of tox introduced via corynebacteriophage by repressing the transcription of tox under high iron condition and vice versa [3, 4].

In Malaysia, diphtheria toxoid vaccine is listed in the Malaysia immunization schedule and provided by the Ministry of Health Malaysia. However, not all parents bring their children for vaccination as it is not mandatory. The unvaccinated individuals would be at high risk to acquire the disease from potential diphtheria carriers. Sporadic cases were spotted over the years and recently, there was a sudden surge of diphtheria cases in year 2016, with 31 cases compared to 4, 2, 4 cases in year 2013, 2014 and 2015 respectively [5]. Our study provides the general overview of Malaysia’s C. diphtheriae by determining the relatedness among local C. diphtheriae isolated within 31 years (from year 1986 to 2016) and comparing these strains with other strains worldwide using single nucleotide polymorphism (SNP) analysis. We also studied the genetic variability of tox and DtxR in these strains.

Main text

Materials and methods

A total of eighty C. diphtheriae isolates comprising of 58 toxigenic and 22 non-toxigenic strains from Malaysia, India, Belarus, Africa, Brazil, United Kingdom, Italy and USA were analysed in this study including 28 Malaysia’s isolates (27 toxigenic and 1 non-toxigenic) which we had submitted previously to GenBank under project PRJNA345527 [6]. All the 27 toxigenic strains showed positive Elek test [7]. The other selected genomes were selected randomly from the C. diphtheriae strains deposited in GenBank [8,9,10,11]. All the genome data used in this study and their accession numbers were specified in Table 1. The construction of the phylogenetic trees was done using kSNP version 3.0 [12] at k-mer = 19 and illustrated by FigTree version 1.4.3 [13]. Two individual phylogenetic trees were constructed based on the SNPs in core genome and pan genome. For pan genome analysis, only the shared SNPs found in at least 90% of the genome were considered. The change of the SNPs is inferred by the branch length. The phylogenetic tree was analyzed at bootstrap value > 0.9 and arranged in decreasing order. Multiple sequence alignments for tox and DtxR genes were constructed and analysed by Clustal Omega [14].

Table 1 Designation of the Corynebacterium diphtheriae isolate used in the analysis in this study

Results and discussion

In this study, we used a total of 80 genomes including toxin and non-toxin bearing C. diphtheriae to create an overview of C. diphtheriae strains in Malaysia. With the advance in next generation sequencing, we applied whole genome SNP analysis in our study by comparing the SNPs in core genome and pan-genome which includes the full complement of bacterial genes: core genome and dispensable genome [15, 16]. The relationship between specific geographical locations within Malaysia which consist of Peninsular and East Malaysia were not evaluated in the study. We assumed that there were frequent movements of the probable carriers between these two areas which might affect our analysis.

Lesser SNPs was observed in core genome (29,184 SNPs) compared to pan-genome (55,071 SNPs). Both core (Fig. 1) and pan-genome (Fig. 2) SNP-based phylogenetic analysis divided the C. diphtheriae strains into two large clusters: I, II and A, B respectively. We observed an almost equal percentage of toxigenic and non-toxigenic strains in cluster I and II using core genome phylogenetic analysis. However, in pan-genome phylogenetic analysis, the majority of the toxigenic strains were in cluster B (75.9%) whilst non-toxigenic resided in cluster A (63.6%). Further statistical analysis using Pearson’s Chi square test showed that there is a significant difference between cluster A and B with A consisting of non-toxigenic strains and vice versa at p = 0.001. The majority of Malaysia’s toxigenic isolates (85.2%) were clustered in B except for C110, C319, C517 and RZ358. These four isolates as well as toxigenic strains: TH510, TH1526 from India; CD1791, CD2173, CD72, CD2225, CD5052, CD4728 from Belarus; CD31A from Brazil along with NCTC13129 and NCTC5011 from United Kingdom, were scattered among the non-toxin bearing isolates. Among them, 3 out of 4 Malaysia’s toxigenic isolates (C110, C319, C517), except RZ358, claded with those from Belarus and United Kingdom in cluster A. These observations showed that there is a unique and close relationship between these non-toxigenic and toxigenic strains. Therefore, there is a possibility that the tox may not be the cause of the pathogenicity which may bear to the ineffectiveness of the toxoid vaccine. The rising awareness of the other virulence factors besides toxin has brought to the investigations on iron acquisition system, resistance mechanism, and pathogenicity islands [17, 20, 21].

Fig. 1
figure 1

Core genome SNP-based phylogenetic tree analysis of 80 Corynebacterium diphtheriae strains grouped in cluster I and II. The SNPs is only considered if there is at least 90% of the genome has the nucleotide change at the position. ^ and * refer to Malaysia’s and non-toxin bearing isolate, respectively

Fig. 2
figure 2

Pan-genome SNP-based phylogenetic tree analysis of 80 Corynebacterium diphtheriae strains grouped in cluster A and B. The SNPs is only considered if there is at least 90% of the genome has the nucleotide change at the position. ^ and * refer to Malaysia’s and non-toxin bearing isolate, respectively

The overall distribution of core genome and pan-genome SNP-based phylogenetic tree was different. The pan-genome SNP analysis is able to detect slight changes in genetically-close organism especially those in the accessory genomes, therefore further discriminate the strains with similar core genome. This could be due to the regrouping of the strains as a result of the SNP changes in accessory genomes compared to the conserved core genome. A similar observation was also depicted by Sangal et al. showing discrepancy in the clustering and degree of variation using the same set of strains in core vs accessory genome and proteome analysis [17]. A marked difference was noted when a large cluster of toxigenic strains were shifted to cluster B and both BH8 and CD31A from Brazil to cluster A in pan-genome SNP phylogenetic tree. The pan-genome SNP analysis has also brought Malaysia’s strains: RZ632 and RZ356 to be closer to Africa’s strains. It is also interesting to see that a number of recent outbreak strains from Malaysia, India and Africa in year 2016 were grouped closely to each other within cluster B. The clustering of the strains by both SNP analysis were slightly different with the core genome sequence alignment generated phylogenetic tree as reported by Hong et al. and Trost et al. [8, 20]. The intra-clustering within a clade may not be altered when genetically distinct species is introduced. However, in our study, the introduction of Belarus strains showed a high relatedness with Malaysia’s strains leading to the recalculation of the genetic difference and restructuring of the cluster.

PW8 (toxoid vaccine) is used as the reference and indicator for molecular analysis of tox and DtxR genes. All the local strains’ tox gene were aligned and compared against PW8 using Clustal Omega. One or two points mutation were detected at nucleotide level in tox but the amino acid sequences were in perfect sequence identity with PW8 except for RZ319 and RZ597 which presented a non-synonymous amino acid change by the substitution of histidine to tyrosine at position 24 (H24Y) with no deleterious effect as predicted by PROVEAN [18, 19]. This observation showed that Malaysia’s strains produce single antigenic type of toxin similar to the toxoid.

Genetic variations in the composition of DtxR might influence the tox gene expression and the virulence of C. diphtheriae [3, 4]. The analysis on local strains by comparing to PW8, showed that all except four C. diphtheriae strains, C110, C517, C319 and C113, had non-synonymous amino acid change in DtxR. Two non-synonymous SNPs: alanine to valine (A147V) and leucine to isoleucine (L214I) at position 147 and 214 respectively were located in C110, C517 and C319, all in cluster A. This observation is in concordance with a report shown by Nakao et al. who reported most amino acid substitution occurs in the carboxyl-terminal half of DtxR and both the amino acid substitution, A147 and L214I were observed in Russia and Ukraine strains [3]. However, a different observation in our isolate was the amino acid substitution at position 150, changing threonine to asparagine (T150N) of C113. However, all of them were predicted to be neutral by PROVEAN [18, 19].

Our analysis provides a general overview on the Malaysia’s C. diphtheriae isolates and the difference in genetic relatedness caused by the accessory genomes at a glance. Pan-genome SNP analysis allows a more rapid and efficient genetic relatedness observation using SNP variation especially in outbreak study to discriminate variations in core genome and accessory genome between genetically similar species [15, 16]. A further insight into the variability in the accessory genome between the closely related toxigenic and non-toxigenic local strains, for instance, RZ358, will be required to understand the acquired pathogenicity other than toxin such as the presence of functional genomic islands [17, 20, 21]. Our current analysis has significantly divided the toxigenic and non-toxigenic strain into two clusters, focusing mainly on local isolates. The observation might differ if more toxin-bearing clones with non-toxin related pathogenicity were introduced in the future.

In conclusion, over the years, sporadic diphtheria cases in Malaysia were shown to bear diverse strains. Based on the pan-genome SNP analysis, it is possible that the C. diphtheriae strains isolated in Malaysia could be of Belarus, Africa and India origin or vice versa based on the shared SNPs. However, the majority of the strains isolated in the year 2016 outbreak were clustered with strains isolated from as early as year 1986 indicating the presence of a persistent local strain in the population for decades. The non-toxigenic and toxigenic strains can also be clustered in A and B with regards to the toxin status. All the Malaysia clinical isolates produced single antigenic type of diphtheria toxin, similar to PW8. Given the well-conserved amino acid composition of toxin and DtxR of these local isolates compared to PW8, the alteration in the efficacy of the currently used toxoid vaccine would be unlikely.

Limitations

The investigation on the specific type of accessory genome would be useful to understand the connection between toxigenic and non-toxigenic Corynebacterium diphtheriae strains in Malaysia. Most of the local C. diphtheriae isolated are toxigenic strains and only one non-toxigenic strain is available for analysis.