Background

Tropheryma whipplei is an actinobacteria pathogen causing Whipple’s disease in Homo sapiens. This pathogenic problem was discovered and found to be associated with gastroenteritis, endocarditis, and neuronal damages in Caucasian individuals [1]. Regardless of this, its lethal impact was additionally seen in canines [2]. The credit for its name and disclosure was connected with honorable Nobel laureate G. H Whipple, who performed many explorations for lipodystrophy (malfunctioned lipid biosynthesis and ingestion) brought about by T. whipplei [3] has a broad-spectrum infection. Caucasian populaces, kids, sewage, and farming specialists were discovered to be generally influenced by this illness. The bacterium causes immunomodulation with an extended IL-16 discharge, IL-10 synthesis, and dysregulation of mucosal T-helper cells. Further immunological irregularities were depicted because of Whipple’s disease’s multifaceted nature [4]. Clinical side effects of this infection were seen as extreme looseness of the bowels, loss of body weight, and weakness among patients [5]. T. whipplei assaults lamina propria of the gastrointestinal tract and targets macrophages for its replication [6]. Sequencing of two strains of T. whipplei (Twist and TW 08/27) was effectively led by the French researchers that already open scope for genomic examination and improvement of better treatment procedures for this lethal sickness; in their investigation, it was discovered that this actinobacterium has low GC content (46%) in correlations with other relatives of a similar order [7].

Current medicines like doxycycline, hydroxychloroquine, and trimethoprim/sulfamethoxazole must be used for almost 2 years and lifetime follow-up for patients [8, 9]. Later in silico concentrates on epitope-based vaccine design can become conceivable prophylaxis for Whipple’s illness [10]. This actinobacterium has a huge encoding of surface proteins, while some are additionally connected with the enormous substance of noncoding redundant DNA. This genome additionally shows the fluctuation in genomic sets, including phase variations causing the modifications of cell proteins; this shows the importance of immune bypass and association with the host genome [1, 7]. Such uncommon genomic trademark highlights of bacterium open wide scope in discovering codon utilization patterns to uncover characteristic and mutational determination. Codons contained 3 nucleotides in sequence and coded for a particular amino acid or as a STOP codon for translation. The differences in codon usage are differences defined in codon usage bias. Equivalent codon utilization in numerous prokaryotic unicellular life forms is consistently connected with the directional mutational inclination and translational choice [11]. Other elements like replication-translation determination, protein hydropathy, can likewise have a critical impact [12]. In some microbial pathogen species, mutational predisposition was discovered to be strand explicit, and those living beings show differed interchangeable and nonequivalent codon utilization [13]. This examination not just give experiences about characteristic and mutational determination pressures acting at genomic levels of T. whipplei yet besides offer a superior cognizance of transformative improvements in this host-versatile bacterium. This computational examination uncovered the data concerning profoundly translated proteins and enzymes of this bacterium, and the conceivable amino acids that can be considered in epitope-based prophylaxis plan to get the inhibitory effect on bacterial action on its host or to create a better conceivable treatment like in immunoinformatics-based recent studies [14, 15]. Ribosomal RNA (16S and 23S) codon usage patterns were analyzed here to determine the changes associated with evolutionary or phylogenetic patterns of the bacterium. In this study, we also revealed epitope-based peptide vaccine candidates against Tropheryma whipplei. The aim of the study is to determine codon usage patterns in T. whipplei, and on the basis of that we predicted epitope-based vaccine candidate by deploying latest bioinformatics tools.

Methods

Codon data retrieval

To measure the codon usage bias, retrieved codon usage tables from codon and codon pair usage tables (CoCoPUTs) database. This database showed the relative frequency that different codons are used in genes in T. whipplei RefSeq data. Similarly, codon-pair usage tables displayed the counts of each codon pair in the CDSs of T. whipplei genomic data (RefSeq) and calculated codon-pair usage bias.

Retrieval of genomic data and codon usage table

The complete nucleotide sequences of T. whipplei strains. The selected FASTA sequences of Twist 16S ribosomal RNA and 23S ribosomal RNA were retrieved from the NCBI Refseq database (https://www.ncbi.nlm.nih.gov/nuccore). The codon usage dataset was retrieved from the Codon Usage Database (http://www.kazusa.or.jp/codon/).

Genomic sequence optimization

All codons in the original sequence of T. whipplei strains are replaced with the corresponding redundant codon having the highest codon usage frequency. ATGme tool [16] was used to identify rare codons and accordingly optimize genomic sequences (http://www.atgme.org/). Genomic sequences in FASTA format pasted in the search box, and codon usage table pasted in the respective interface and processed the data for analysis of rare codons and sequence optimization.

Codon usage measurements

From the identified genomic sequences of ribosomal RNA, nucleotide composition was computed. The G + C composition of 1st, 2nd, and 3rd positions and GC1s, GC2s, and GC3s in the codons were discovered for the frequency and mean frequency identification. The frequency of synonymous third position codon and percentage, i.e., A3, T3, G3, and C3 and %A3s, %C3s, %T3s, and %G3s, respectively, was calculated. To measure the bias of synonymous codons, the effective number of codons (ENC) was identified. Additionally, codon usage, codon usage per thousand, and relative synonymous codon usage (RSCU) were also calculated using “CAIcal” tool availed from https://ppuigbo.me/programs/CAIcal/.

Epitope-based vaccine prediction

Proteomic data for Tropheryma whipplei was accessed from NCBI GenBank database, and then allergenicity was estimated by deploying AllergenFP server [17]. NetMHCIIpan-4.0 server [18] was used to screen epitopes from selected proteins that can interact with human leukocyte antigen (HLA) proteins. VaxiJen 2.0 tool [19] was used to reveal antigenicity of screened epitopes. Epitopes structure was predicted by using PEP-FOLD 3.5 [20], and HLA allelic determinant HLA DRB1_0101 (PDB-ID:1AQD) was retrieved from RCSB-PDB database. Biochemical properties for epitopes were calculated by using ProtParam tool of ExPASy web server.

Molecular docking between epitopes and HLA determinants was done by using PatchDock [21], FireDock, and DINC web tool [22]. These tools not only assist in docking in user-friendly approach but also calculate different parameters like global energy, atomic contact energy, and binding energy for docked complexes.

Results

Identified codons and calculated usage bias

The codon-pair usage table and dinucleotide usage data were identified from the CoCoPUTs database [23, 24]. The T. whipplei taxonomy ID or taxid (2039) was verified by NCBI’s taxonomy tool, and the taxonomy was illustrated in Fig. 1. The log-transformed codon-pair frequency heat map was discovered from the data analysis as illustrated in Fig. 2. The degree of ENC values ranges from 20 to 61 [25]. If the value is 20, then one codon coding for each amino acid and value ranged to 61 means all the synonymous codon was used for each amino acid. The ENC value computed in our analysis was 56.138, which means more than one codon was used for each amino acid. The ENC value should be ≤ 35 for significant codon bias [26]. So, the higher ENC value indicates the low codon usage bias in T. whipplei. The ENC value details are demonstrated in Table 1.

Fig. 1
figure 1

Taxonomy and strains of Tropheryma whipplei

Fig. 2
figure 2

Heatmap of log-transformed codon usage

Table 1 Effective number of codon pairs for each T. whippelli

The codon usage details are summarized in the Table 2, and the codon usage frequency per 1000 codons is illustrated in Fig. 3. The RefSeq (n = 859) of T. whipplei had 88597 CDSs and 28006357 codons. Table 2 illustrated the CDS and its codon pair. The codons GTT (37.06), GAT (37.03), CTT (32.53), and TTT (30.88) were identified as the highest usage frequency (frequency value shown in bracket). Dinucleotide frequencies per 1000 dinucleotide are demonstrated in Fig. 4.

Table 2 Tropheryma whipplei RefSeq codon table contains 88597 CDSs (28006357 codons)
Fig. 3
figure 3

Codon frequencies of Tropheryma whipplei

Fig. 4
figure 4

Dinucleotide frequencies of Tropheryma whipplei

Tropheryma whipplei str. Twist codon usage table

Tropheryma whipplei strain Twist complete sequence of 23S and 16S ribosomal RNA genes were composed of 3102 base pairs and 1521 base pairs, respectively. Tropheryma whipplei Twist strain’s CDS, codons, frequency per thousand, and the number of codons details are summarized in Tables 3 and 4. These codon usage tables were used for the identification of rare codons and sequence optimization.

Table 3 Tropheryma whipplei str. Twist 808 CDS’ (266294 codons) codons, frequency per thousand, and in bracket number of codons
Table 4 Tropheryma whipplei TW08/27783 CDSs and 261028 codons, frequency per thousand, and in bracket number of codons

Rare and very rare codons

The analysis resulted from usage data, original sequence, and optimized sequence. Tropheryma whipplei strain Twist 23S ribosomal RNA gene sequence analyzed usage data predicted GTT and GAT (36.7% and 36.3 %) had the high frequency in codon usage. TAA, TAG, and TGA code as “STOP” had the lowest usage frequency percentage ((0.9 %, 1.0 % and 1.1 %) and found these are the very rare codons. The rare codons are CGA, TGC, CGG, TGT, CAC, ACG, CCC, and TCG. The stop codons are terminating the protein translation process [27]. The details of rare codons and very rare codons (code as, count, and percentage of usage frequency) of 23s and 16S rRNA were summarized in Tables 5 and 6.

Table 5 Tropheryma whipplei strain Twist 23S ribosomal RNA gene
Table 6 Tropheryma whipplei str. Twist 16S ribosomal RNA

Codon measurement

The calculated compositional properties for the coding sequences of the Tropheryma whipplei Twist strain are overall frequency of nucleotides A% (25.11 and 23.54), C% (22.76 and 24.0), T% (20.76 and 19.4), and G% (31.37 and 33.07) in 23s and 16s ribosomal RNA gene, respectively. The synonymous codons had the base content in 3rd position were calculated as A3S% (24.47 and 22.88), C3S% (20.99 and 22.88), T3S% (21.47 and 19.53), and G3S% (33.08 and 34.71) for 23s and 16s rRNA, respectively. GC3S% (52.85 and 57.85) is the third synonymous codon position in GC content of 23s and 16s rRNA, respectively. Figures 5 and 6 show rRNA characteristic features like length and nucleotide composition. In Fig. 7, rRNA synonymous codons percentage is given, while in Fig. 8, codon measurements were indicated.

Fig. 5
figure 5

rRNA length and nucleotide composition

Fig. 6
figure 6

Percentage of rRNA nucleotide composition

Fig. 7
figure 7

Percentage of rRNA synonymous codons

Fig. 8
figure 8

Codon measurement values

Epitope-based vaccine prediction: application of codon usage studies

The in silico analysis reveals two epitopes of 15 amino acid residues (i.e., KPSYLSALSAHLNDK and FKSFNYNVAIGVRQP) that hold perfect interaction with HLA-DRB-0101 (MHC class II allelic determinant). In Table 7, retrieved sequences were shown with accession numbers, and allergenicity was also presented by deploying Allergen FP tool (this tool generates Tanimoto similarity index). Epitopes were determined by using NetMHCIIpan-4.0 server that gathers core information from IEDB database and uses artificial neural networks (ANN) to access interaction of peptidal stretches to HLA allelic determinants. Amino acids like valine, aspartate, leucine, and phenylalanine hold high codon usage frequency and also found to be present in these screened epitopes from excinuclease ABC subunit UvrC and 3-oxoacyl-ACP reductase FabG. In Table 8, all 10 peptides are holding good VaxiJen score, and NetMHCIIpan-4.0 scores are provided, but there were a total of 2151 epitopes discovered. VaxiJen score indicates antigenicity for peptides. ProtParam results reveal only two finalized epitopes to be stable (Table 9). Epitopes structure was predicted by using PEP-FOLD 3.5 [20], and HLA allelic determinant HLA DRB1_0101 (PDB-ID:1AQD) was retrieved from RCSB-PDB database to perform molecular docking analysis. Molecular docking of selected epitopes with HLA-DRB0101 shows perfect interaction (Table 10). Figure 9 indicates docked complexes of selected epitopes with HLA-DRB-0101 visualized in PyMOL software.

Table 7 AllergenFP score and proteins considered for Tropheryma whipplei
Table 8 Peptides showing interaction to HLA-DRB0101, NETMHCII PAN 4.0 server results, and VaxiJen score
Table 9 ProtParam results: biochemical properties of epitopes
Table 10 ACE VALUE, global energy, and binding energy for selected docked complexes (epitopes to HLA DRB0101)
Fig. 9
figure 9

Molecular docking results of epitopes with HLA-DRB-0101. A KPSYLSALSAHLNDK from protein excinuclease ABC subunit UvrC and B FKSFNYNVAIGVRQP from protein 3-oxoacyl-ACP reductase FabG

Discussion

The Tropheryma whipplei causes acute gastroenteritis to neuronal damages in Homo sapiens. Genomics and codon adaptation studies would be helpful advancements of disease evolution prediction, prevention, and treatment of disease. The codon-pair usage table and dinucleotide usage data were identified from the CoCoPUTs database [23, 24]. The ENC value computed in our analysis was 56.138, which means more than one codon was used for each amino acid. The ENC value should be ≤ 35 for significant codon bias [26]. Tropheryma whipplei Twist strain’s CDS, codons, frequency per thousand, and the number of codons; for identification of rare codons and sequence optimization. The ratio of observed codon frequency to the expected synonymous codons usage for the amino acid i.e., relative synonymous codon usage (RSCU) [28]. The degree of bias towards estimated, i.e., Codon Adaptation Index, value was 0.73 and 0.725 for 23s and 16s rRNA respectively. The value ranged between 0 and 1; higher values indicate stronger bias in codon usage and high gene expression level. In previous studies, membrane proteins were considered to be associated with considerable biasness [29], while in current study, we recognized rare codon biasness associated with entire genome of T. whipplei. The major requirement of codon biasness study assists in determining amino acids expressed patterns that can be linked to epitope-based vaccine predictions. In recent studies, for SARS-CoV2 [30, 31], dengue [32, 33], Nipah [34], Candida fungus [35], Canine circovirus [36], and Zika virus [37], vaccine predictions were found to be successful. So, codon usage pattern determination can be considered as the preliminary step before deploying any ANN (artificial neural networking)-based web server/tool like NetMHC server for screening essential epitopes of small peptidal length (8–12 amino acids). The calculated compositional properties for the coding sequences of the Tropheryma whipplei Twist strain overall frequency of nucleotides A% (25.11 23.54), C% (22.76 24.0), T % (20.76 19.4), and G% (31.37 and 33.07) in 23s and 16 s ribosomal RNA gene respectively. In silico analysis reveals two epitopes of 15 amino acid residues (i.e., KPSYLSALSAHLNDK and FKSFNYNVAIGVRQP) that hold perfect interaction with HLA-DRB-0101 (MHC class II allelic determinant); future scope holds linkers and adjuvants to be connected and solid-phase synthesis of these epitopes to further test these epitopes in model organisms. Recent developments in immunoinformatics show novel ways to predict epitope-based vaccine candidates and therapeutics against many harmful pathogens like Candida auris [35] and human cytomegalovirus [38]. Similarly, drug repurposing was made easy against harmful pathogens by deploying bioinformatic approaches [39]. Similarly, for animal models, viral pathogenic proteomes were screened for vaccine designing by deploying immunoinformatics [33, 36, 40]. This study is unique in terms of saving time and money for peptide-based vaccine crafting.

Conclusions

Considerable biases in codon usage and amino acid usage indicate clearly that T. whipplei has a low codon bias. The synonymous codons had the base content in 3rd position were calculated as A3S% (24.47 and 22.88), C3S% (20.99 and 22.88), T3S% (21.47 and 19.53), and G3S% (33.08 and 34.71) for 23s and 16s rRNA, respectively. Also, codon-usage patterns clearly indicate that there will be less chances of variational or evolutionary alterations in T. whipplei genomic sets. The analysis could be targeted for disease evolution prediction, developing drugs, or vaccine candidates. We also found KPSYLSALSAHLNDK and FKSFNYNVAIGVRQP, two epitopes, can possibly act as vaccine candidates against T. whipplei. A future development requires wet-lab validations for these epitopes that are highly expressed in this bacterium and have therapeutic peptide formation capability.