Introduction

Porphyromonas gingivalis is a Gram-negative anaerobic bacterium and is considered as a major etiological agent in destructive periodontal diseases [1, 2]. It is frequently detected in a high ratio from severely inflamed subgingival lesions, whereas it may also be isolated from the healthy population at a very lower percentage, and these P. gingivalis constituted a significant proportion of the oral microflora. Previous studies have also shown phenotypic differences in the pathogenicity of strains of P. gingivalis in animal models [3] and in humans [4]. Therefore, P. gingivalis might be an important potential pathogen [5]. Recently, DNA fingerprinting and restriction fragment length polymorphism (RFLP) analysis have confirmed that the highly virulent P. gingivalis isolates always carry an rag locus, whereas the avirulent strain does not contain the rag locus [68].

The rag locus of virulent P. gingivalis encodes RagA (receptor antigen gene A), a 115-kDa outer membrane protein with features of a TonB-dependent receptor, and RagB, a 55-kDa antigen which makes periodontal patients induce an elevated immunoglobulin G response. The RagA and RagB form a putative co-distribution and co-transcription complexes on the surface of P. gingivalis. The complexes constitute a membrane transporter system, which involves the uptake or recognition of a specific carbohydrate or glycoprotein [911]. Four rag locus variants have been detected from periodontal patients [2]. The rag locus has three characteristics: (1) the locus has a lower G+C content compared with the average values (48%) of the complete genome; (2) there is an insertion sequence (IS) 1126 with 1,338 bp (IS1126, now known as ISPg1), located upstream of the rag locus and flanked by 12-bp inverted repeats, which is a mobile element that generates a 5-bp target site duplication [12] and contains a single open reading frame (ORF) encoding the transposase which belongs to the IS5 family [13]; (3) polymerase chain reaction (PCR) and Southern blots results indicate that the rag locus has a restricted distribution within the species (only existing in virulent genotypes); the rag locus is also amplified from the subgingival samples of undetected P. gingivalis. The characteristics mentioned above make the experts speculate that the rag locus has arisen from exogenous genes and is obtained via horizontal gene transfer, and that the rag locus may represent a pathogenicity island in P. gingivalis. However, so far, no report has indicated from where the rag locus arises.

Since from where the rag locus arises is still unknown, the present work was to investigate the prevalence of P. gingivalis in the gingival crevicular fluid (GCF) of periodontitis patients from Zhenjiang, China, to detect the rag locus distributions among periodontitis patients infected by P. gingivalis using multiplex PCR, and to analyze the origin of the P. gingivalis rag locus based on evolution.

Materials and methods

Subjects, selection of sites, and sampling and DNA extraction

A total of 110 GCF samples were obtained from 110 patients (58 female and 52 male, aged between 15 and 68 years) going to hospital for periodontal diseases at the First Affiliated Hospital of Jiangsu University, China. Patients were screened for periodontal diseases by radiographic evidence of bone loss, or with visible plaque, marginal bleeding, bleeding on probing (BOP), and clinical attachment loss (CAL) and periodontal probing depth (PD) ≥4 mm at two or more sites [14, 15]. Exclusion criteria were pregnancy, smoking, and systemic antimicrobial/anti-inflammatory drug therapy within 6 months before the baseline visit. The 110 patients were classified by the 1999 International Workshop for a Classification of Periodontal Diseases and Conditions [16, 17]: 57 patients were localized chronic periodontitis, 33 were generalized periodontitis, and 20 were necrotizing periodontal diseases (three patients were accompanied with diabetes and five patients had cardiovascular disease). The present study was conducted in accordance with the Declaration of Helsinki. All patients gave informed consent and the research protocol was approved by the Committee for Ethical Affairs of Jiangsu University.

GCF samples were obtained from the two deepest periodontal pockets in the maxilla according to Rüdin et al’s technique [18]. Before sampling, the respective site was cleaned by cotton rolls; a gentle air stream of 5-s duration with a 90° angle to the tooth axis was used and supragingival plaque was eliminated. Standardized paper strips were inserted 1 mm into the sulcus and held for 30 s. Papers with blood contamination were discarded. To avoid evaporation, paper strips with GCF were immediately put into a sterile Eppendorf tube and stored at −20°C until laboratory analysis.

The isolates in the DNA were extracted by the boiling lysis method [19]. The templates were prepared by suspending in 200 mL of sterile water, followed by boiling for 10 min and centrifuging for 3 min.

PCR-detected P. gingivalis and multiplex PCR amplification of rag locus genes

The 16S rRNA specific primers were used to detect P. gingivalis and four different rag locus variants primers were used to amplify the rag locus among GCF samples containing P. gingivalis by multiplex PCR; the primers are listed in Table 1 [2, 20]. PCR assays were performed in volumes of 15 μl under the following conditions: 1.5 μl 10 × PCR buffer (Mg2+), 0.2 mM of each deoxynucleoside triphosphate (dNTP), 0.1 μM of primers, and 1.5 units of Taq DNA polymerase (Takara Biotechnology [Dalian] Co., Ltd.) for 10 min at 94°C for 35 cycles, with each cycle consisting of 30 s at 94°C, 45 s at 50°C (for 16S rRNA) and 54°C (for the rag locus), and 1 min at 72°C, with a final step of 10 min at 72°C for standard PCR assays.

Table 1 Primers used in this study

Databases and search strategies

A comprehensive search by PSI-BLAST was performed in the National Center for Biotechnology Information (NCBI) protein database (Reference proteins, RefSeq proteins) using the published ragA1 and ragB1 proteins of P. gingivalis as query sequences. The cutoff values were e-100 for ragA1 and e-10 for ragB1, respectively. The redundant sequences were removed by the DAMBE program, and the nucleotide sequences were also retrieved from GenBank.

Additionally, the function-similar proteins of the rag locus were also retrieved from GenBank, and the similar proteins included hemin/hemoglobin receptor protein (HmuR), TonB-linked adhesin (Tla) of P. gingivalis, TonB-dependent colicin I receptor (CirA) of Escherichia coli and Salmonella enterica [9, 21], and the lactoferrin and transferrin binding systems in Neisseria and Haemophilus species (TBP2 and LBP2), P. gingivalis orf3, Vibrio cholerae-IS1358, and E. coli orfH. These similar proteins were used to analyze the gene characteristics.

Sequence comparison and phylogenetic analysis

Multiple sequence alignments were performed using the ClustalX package, and then the results were manually corrected. All alignment gap sites were eliminated before phylogenetic analyses. The phylogenetic trees were reconstructed with the neighbor-joining (NJ) and minimum evolution (ME) methods implemented in MEGA4.0. The bootstrap values were estimated with 1,000 replications. Multiple sequence alignments were also carried out between the rag locus and highly homologous sequences by ClustalX in order to further identify conserved protein motifs. The alignments were shaded by the BOXSHADE program (http://www.ch.embnet.org/software/BOX_form.html).

Tests for selection

To test whether the rag locus was evolving neutrally, we aligned the nucleotide sequences and then performed two likelihood ratio tests. Firstly, we compared a model in which all branches had the same dN/dS ratio of 1, in which each branch had its own ratio, to determine whether it was appropriate to use branch-specific tests for selection. As the free-ratio model was a significantly better fit to the data (P > 0.0001), we then performed a second set of tests, comparing a model in which the dN/dS ratio was fixed to 1 with a model in which the dN/dS values of all branches were freely estimated. Model likelihoods were estimated using PAML [22].

Results

The prevalence of P. gingivalis and four rag locus variant gene distributions

We detected 29 P. gingivalis from 110 GCF samples, one from 57 cases of localized chronic periodontitis, nine from 33 generalized chronic periodontitis patients, and 19 from 20 necrotizing periodontal patients. Among the 29 GCF samples containing P. gingivalis, only 19 cases were amplified from the rag locus genes (ten to rag1, six to rag2, and three to rag3), while rag4 was undetected. Among the 19 cases containing the rag locus, 18 cases were amplified from necrotizing periodontal patients and one from a generalized chronic periodontitis patient. In particular, three patients were accompanied with diabetes and five patients who suffered from cardiovascular diseases also amplified the rag locus, and all of the 19 cases were from at least three sites with a PD of 8 to 10 mm, accompanying clearly visible plaque or BOP and CAL, whereas none of the localized chronic periodontitis patients amplified the rag locus and the PD of the localized chronic periodontitis patients was normally between 4–6 mm (Fig. 1).

Fig. 1a–c
figure 1

The prevalence of Porphyromonas gingivalis and four rag locus variant gene distributions. a Agarose gel electrophoresis image of P. gingivalis detection. Lane 1 marker, lane 2 positive control, lane 11 negative control, lanes 3–10 samples. b Image of four rag locus variants gene distributions. Lane 1 marker, lane 2 positive control, lane 11 negative control, lanes 3–10 were samples. c Case numbers of varying degrees of periodontitis, the number of P. gingivalis detected in different periodontitis patients, and the four rag locus variant gene distributions

Phylogenetic analysis

A total of 153 homologous amino acid sequences were retrieved from the RefSeq proteins. The abundant sequences were removed by the DAMBE program and a total 50 protein sequences were included in the final dataset. In order to determine the evolutionary relationships between the P. gingivalis rag locus and homologous sequences, the phylogenetic trees were reconstructed by two different methods (ME and NJ methods) with high bootstrap values. There were three major clusters (I, II, and III), which were statistically supported (Fig. 2). Cluster I mainly included Bacteroides sp. TonB-dependent receptor (SusC/CirA), P. gingivalis ragA, Porphyromonas sp ragA, and Capnocytophaga sp. TonB-dependent receptor; cluster II was constituted of TonB-dependent receptor of Parabacteroides sp., Dyadobacter fermentans, Robiginitalea biformata, Capnocytophaga ochracea, Flavobacterium johnsoniae, Croceibacter atlanticus, Bacteroides sp., Dyadobacter fermentans, and Parabacteroides sp.; and P. gingivalis ragB, Porphyromonas sp. ragB and SusD of Prevotella sp., Bacteroides sp., and Capnocytophaga sp. were involved in the cluster III. This tree showed that branches 1 and 2 in cluster I were the paralogy, and in the cluster III, the two branches were also paralogy. The ragA was highly similar to the TonB-dependent receptor of Bacteroides sp., and ragB was analogous to ragB/SusD of Prevotella sp., Bacteroides sp., and Capnocytophaga sp., respectively.

Fig. 2
figure 2

Phylogenetic relationship between the P. gingivalis ragA/B and highly homologous protein sequences

Analysis of G+C content

The G+C contents of the rag locus, homologous genes, and function-similar genes were analyzed by MEGA4.0 (Table 2). From this table, we could divide these gene sequences into three clusters: lower G+C contents (<40%), moderate (between 40% and 50%), and higher G+C contents (>50%). The P. gingivalis rag locus gene G+C contents were close to Bacteroides sp. Sus.

Table 2 G+C contents of the rag locus, homologous genes, and function-similar sequences

Multiple alignments of the rag locus and Sus

The P. gingivalis rag locus and Bacteroides sp. Sus were aligned for further evolutionary analysis (Fig. 3). The ragA and SusC proteins had three domains: the carboxypeptidase regulatory region, TonB-dependent receptor plug Domain, TonB-dependent receptor (also named TonB box I, box II, and box III), and the ragB and SusD were also constituted of two domains, lipoprotein and RagB/SusD. All of these domains are shown in Fig 3. As shown in Fig. 3a, the RagA motifs generally showed a good match to the three domains of Bacteroides sp. SusC, for example, P××GA×××V×G×TT×G××TD×DG×F×LS at the TonB box I; LKDA××T×IYG×RA×NGV at the TonB box II. However, the amino acids were often substituted in other domains. There are also some conserved positions in lipoprotein and RagB/SusD domains indicated in Fig. 3b. Nevertheless, the major positions in other domains showed far less conservation.

Fig. 3a, b
figure 3

Comparison of the P. gingivalis rag locus with Bacteroides Sus. a, b Multiple amino acid sequences alignment between the P. gingivalis RagA locus with Bacteroides SusC. The asterisks indicate positions within the proteins that were identical. The carboxypeptidase regulatory region, TonB-dependent receptor plug Domain, TonB-dependent receptor, lipoprotein, and RagB/SusD domains are shown

Selection

We investigated whether the rag locus was evolving neutrally and examined the ratios of nonsynonymous to synonymous changes (dN/dS) along each branch of the tree. The ragA locus had a dN/dS of 0.23, and the ragB locus had a dN/dS of 0.19. Both these values were significantly lower than a neutral dN/dS of 1 (P > 0.0001, likelihood ratio test, Bonferroni correction for multiple tests), which indicated that these genes were probably evolving under long-term purifying selection.

Discussion

Though the exact roles that P. gingivalis played in the initiation and progression of periodontitis remain unclear, it is definite that P. gingivalis has been known to be a risk factor for periodontitis. Furthermore, accumulated evidence shows that an infection with P. gingivalis might predispose an individual to suffer from cardiovascular disease, diabetes, and the delivery of preterm infants [2326]. All of these factors highlighted the importance of preventing infection from this bacterium. In this respect, information on the virulent factors and genotype of P. gingivalis among individuals becomes important for developing treatment strategies and preventing the transmission of the pathogen to uninfected individuals [12]. Previous researches stated that P. gingivalis produced a number of well-characterized virulence factors, including proteases, fimbriae, and capsule [27]. Recently, a significant association was observed between a carrier of the rag locus and a highly virulent phenotype in a murine model of soft tissue destruction [3, 4, 10, 28].

In order to further determine whether the P. gingivalis rag locus was associated with periodontitis, we investigated the prevalence of P. gingivalis and rag locus distributions in periodontitis patients in this study. We detected 29 P. gingivalis out of 110 patient GCF samples. However, the prevalence of P. gingivalis was lower in localized chronic periodontitis patients (1.75%) and generalized chronic periodontitis patients (27%) [29]. We also suspected that the method of samples collection was insensitive, but others reported that the prevalence of P. gingivalis was high by this method [30, 31]. At the same time, the ratios of P. gingivalis in necrotizing periodontal patients (95%) were in accordance with other reports [3234]. Among the 29 cases containing P. gingivalis, only 19 samples amplified the rag locus variants. And among 19 cases containing the rag locus, 18 (95%) were amplified from necrotizing periodontal patients and one (5%) from a generalized chronic periodontitis patient. All of the 19 cases containing the rag locus were from at least three sites with a PD of 8 to 10 mm, accompanying clearly visible plaque or BOP and CAL; at the same time, some rag locus carrier cases accompanied diabetes and cardiovascular disease, which indicated that the rag locus may be associated with severe periodontitis. Besides, we found that rag1 locus variants may be the prevalent genotype in Zhenjiang; however, other rag locus variants were also common in other regions [29, 3437], which showed that the rag locus distribution had the same obvious regional differences as the bacteria resistance to antibiotics [38].

The rag locus has a restricted distribution within the species and lower G+C contents. Furthermore, the rag locus is located downstream of IS1126. As was well known that the insertion sequence (IS) is a mobile element contributing to gene transposition which could lead to the inactivation of genes, the transcriptional activation of dormant genes, and genomic rearrangements, which are the main causes of the pathogenicity and genetic diversity of bacterial populations [3941], we speculated that the rag locus might have arisen from exogenous genes and obtained through transposition. In order to prove our speculations, a phylogenetic analysis was performed to determine the evolutionary relationships between the P. gingivalis rag locus and homologous sequences. Many proteins were analyzed, for example, RagA/B, Sus, and CirA. All of these proteins belonged to the family of TonB-linked outer membrane receptors which were involved in the recognition and active transport of specific external ligands [36]. In organisms, a TonB-linked outer membrane receptor was also co-transcribed with an outer membrane lipoprotein [42]. The co-distribution and co-transcription of ragA and ragB strongly suggested that they were functionally linked, and the products might form a complex on the outer surface of P. gingivalis, which was involved in a TonB-linked process. RagA, Sus, and CirA were involved in maltose uptake, and RagB was associated with the active transport of lactoferrin and transferrin iron sources. RagB was closely similar to RagB/SusD of Prevotella sp., Bacteroides sp., and Capnocytophaga sp. Furthermore, this tree also showed that branches 1 and 2 in cluster I were the paralogy, and in cluster III, the two branches were also paralogy, which indicated that ragA was highly similar to the TonB-dependent receptor of Bacteroides sp., and ragB was analogous to ragB/SusD of Prevotella sp., Bacteroides sp., and Capnocytophaga sp., respectively. To further determine the origin of the rag locus, the G+C contents of highly homologous genes to rag were analyzed. The analysis of G+C contents revealed that the G+C contents of the rag locus were under the average level of the P. gingivalis complete genome, including HmuR, and that the G+C contents of the rag locus were close to Bacteroides sp. Sus. Moreover, multiple alignments between the rag locus and Sus showed that the RagA and SusC were both constituted of three domains (TonB I, II, and III); and RagB and SusD were constituted of two domains: Lipoprotein and RagB/SusD, respectively. The amino acids were highly conserved, and our tests for selection showed that the rag locus were probably evolving under long-term purifying selection, which also indirectly indicated that the rag locus might arise from horizontal gene transfer.

From all of the evidence mentioned above, we could conclude that the rag locus was not duplicated by self-HmuR, but instead arose from Bacteroides sp. Sus via horizontal gene transfer. The exogenous rag locus promoted the P. gingivalis expression of virulence gene or the activation of dormant genes. Thus, P. gingivalis was differentiated into virulent and avirulent genotype. The rag locus-carried genotype may be associated with severe periodontitis.