Introduction

Alphaviruses belonging to the group IV togoviridae family are positive sense, single stranded RNA viruses. These enveloped viruses are classified as old world and new world. While new world viruses are encephalitogenic, members of the old world are known to induce polyarthritis1. The members of the old world viruses include Chikungunya Virus (CHIKV), Ross River Virus (RRV), Mayaro Virus (MAYV), O’nyong nyong virus (ONV), Semiliki Forest Virus (SFV) and the Barmah Forest Virus (BFV). These mosquito transmitted viruses are globally distributed and are known to cause acute febrile illnesses, malaise, maculopapular rashes, myalgia, and severe arthralgia in humans2,3,4,5. Most often the infection remains endemic, but some viral strains are also associated with large epidemics such as the Chikungunya virus (CHIKV) out-break which was spread across 40 countries with 1.4–6.5 million reported cases globally6. Several studies also report a simultaneous increase in alpha virus associated arthritis lasting even after decrease in viral load7.

While there is a resurgence of alphavirus associated arthritis cases, the knowledge about the molecular level events during infections involved in these conditions and about the direct cause and effect of the phenomenon is very sparse. For instance, macrophages, natural killer cells, CD4+ and CD8+ T lymphocytes have been shown as the main components of the inflammatory cellular infiltrate in animal models of CHIKV and RRV infections8,9,10. Production of a broad range of pro-inflammatory cytokines (IL-6, TNF, IFN-α/β, and IFNγ) and chemokines (MCP-1 and RANTES) post alpha viral infections has also been reported by Lidbury et al.11. Interestingly, a comparative study between CHIKV infection induced gene expression and rheumatoid arthritis induced gene expression changes in mouse models showed similarity in induced cytokine (TNF, IL-8, IL-15, IFN-γ, GM-CSF and lymphotoxin B) profiles12. Another study by Morrison et al.13, demonstrated the presence of activated complement products in the serum of RRV infected mice, drawing similarities with the pathogenesis of rheumatoid arthritis13.

Resemblance between proteins of pathogens and host components has been examined as a trigger for many autoimmune diseases14. In these scenarios, initiation of autoimmune disease by an infectious pathogen is hypothesized to involve immune recognition of self-peptides by way of “molecular mimicry”15 thereby activating autoreactive T cells and generation of autoantibodies. Previous studies16,17 hypothesized that microbial antigens like pulD from Klebsiella sp., nuclear antigen-1 from Epstein-Barr virus and OSP-A from Borrelia sp. may have a possible association with autoimmune diseases like ankylosing spondylitis, Systemic Lupus Erythematosus (SLE) and Lyme arthritis, respectively. A significant sequence similarity between P2-C protein of Coxsackie virus and glutamate decarboxylase of humans has been proposed as a trigger for Type 1 diabetes18. The similitude in clinical manifestation of an alphaviral infection induced arthralgia and rheumatoid arthritis at both phenotypic and molecular levels suggests that alphaviral infections could be a causative link for rheumatoid arthritis.

Overarching goal of this study is to explore the possibility of existence of peptides in alphaviruses which may set off an autoimmune response resulting in rheumatoid arthritis like symptoms in alphavirus infected patients. Structural polyproteins of the alphaviruses were scanned for the presence of conserved stretches of amino acids that might share homology with sequences in human proteins implicated in rheumatoid arthritis. These conserved regions were validated in silico both at sequence and structural level for their immunogenic potential in terms of their ability to act as T and B cell epitopes. Overall, our results posit that polyarthritis associated with alpha viral infections may also involve an autoimmune component due to cross reactivity between viral epitopes and host proteins.

Results

Arthritogenic alpha viruses such as Chikungunya virus (CHIKV), Ross river virus (RRV), Semiliki forest virus (SFV), Mayaro virus, O’nyong-nyong virus (ONV) and Barmah forest virus (BFV) are generally associated with rheumatic diseases in humans, primarily characterized by polyarthralgia and polyarthritis. Although most patients recover within few weeks, the clinical conditions last for 6 months to more than 3 years6. However, molecular mechanisms contributing to this pathology are not very well understood. In the present study, we show a possible role of autoimmune process mediated through molecular mimicry as a likely cause for arthritic like conditions post alphaviral infections. The overall workflow of the study is depicted in Fig. 1.

Figure 1
figure 1

Computational pipeline for the prediction of possible induction of rheumatoid arthritis through molecular mimicry post alphaviral infection.

Multiple sequence alignment reveals conserved regions in the structural polyproteins of arthritogenic alpha viruses

Alphaviral infections share a common feature of inducing RA like conditions, but the commonality in the alphaviruses leading to such a rheumatic outcome is not known. We hypothesized that the commonality could lie in the structural polyprotein of these viruses. Therefore, we have analyzed the structural polyproteins of six arthritogenic alpha viruses for the presence of conserved regions amongst them. As described in the methods section, amino acid sequences of the structural polyproteins of CHIKV, RRV, SFV, ONV, BFV and Mayaro virus were retrieved from UniProt database and subsequently multiple sequence alignment was performed using MUSCLE. Conserved regions identified are shown in Supplementary Fig. 1. The positions of the conserved regions have been depicted in Fig. 2 using the structural polyprotein of the Chikungunya virus as a representative model.

Figure 2
figure 2

Domain diagram of the regions conserved in the structural polyprotein of the alphaviruses. The regions have been depicted using Chikungunya virus (strain S27 African prototype) structural polyprotein as a representative model.

The structural polyproteins of arthritogenic alphaviruses are about 71% identical at the primary (amino acid sequence) level. A total of 41 conserved regions were identified in the structural polyproteins, with E1 glycoprotein region exhibiting highest conservancy with 18 conserved regions, followed by p62 and capsid proteins harboring 15 and 8 conserved regions respectively.

Conserved regions across alpha viruses share homology with human proteins implicated in arthritis

To deduce the possibility of autoimmune reactions involved in mediating post alphaviral polyarthralgia through the presence of molecular mimics, we performed a sequence similarity search between the conserved regions among arthritogenic alpha viruses and the human proteome using the standard BLASTp program. Each of the 41 conserved regions was scanned against the human proteome. Very interestingly, 24 regions showed varying degrees of homology (52.7% to 100%) with human proteins that have been implicated in arthritis like conditions in humans (Table 1).

Table 1 List of human proteins sharing homology with conserved regions of structural polyproteins of alphaviruses and shown to contribute to rheumatoid arthritis-like-condition.

Potential mimic regions harbor HLA binding motifs

Human leukocyte antigen (HLA) is the most widely distributed molecule with high level of polymorphism and are the major governing factors in initiating cellular immune responses. It is thus important that immunogenic peptides bind to multiple HLA alleles to elicit a strong immune response. The 24 conserved regions of the structural polyprotein (refer Table 1) were analyzed for HLA class II binding ability using propred II server. The nonameric peptides generated from each conserved region were scanned against 51 HLA alleles available in the ProPred II server. The number of nonameric peptides generated from each of 24 conserved regions and the number of peptides predicted to bind to different alleles is summarized in Table 2. Peptides generated from 19 of the conserved regions bound to at least one HLA allele, while the peptides from the remaining 5 of the conserved regions did not bind to any of the alleles.

Table 2 HLA binding profiles of potential mimics from the alpha viral structural proteins. A total of 51 HLA class II alleles were used for the study.

As illustrated in Fig. 3, maximum numbers of nonamers were found to be recognized by the allele HLA DRB1_0308, which bound to nine of the conserved region peptides. DRB1_0101 bound to eight and DRB1_1321 bound to seven, DRB1_0102, 0401, 0426, 1101, 1107 and 1120 bound to six conserved region peptides respectively.

Figure 3
figure 3

Number of conserved regions from alpha virus structural proteins binding to individual class II alleles as predicted by ProPred II server. Alleles that exhibited binding to more than 3 peptides are represented out of the 51 alleles studied.

Since the quality of immune response elicited by the mimic peptide in humans should be similar to the one elicited by the viral peptide for an efficient autoimmune response to occur, the structure of the HLA DRII-peptide complex was analyzed. Molecular docking analysis was performed to assess the structural similarity in the binding patterns of the viral peptide and its homologous region in the human protein. The ability of one of the potential conserved regions, FPLANVTCM (viral derived) and its corresponding homologous region KPLANVTLM (human derived) to bind to a class II HLA allele was assessed by docking the peptides in the binding groove of HLA-DRB1, an allele known to predispose individuals to rheumatoid arthritis19. The peptides were docked using GalaxyPepDock server and the docked peptides are shown in Fig. 4. The root mean square deviation between the docked peptides shown in Fig. 4 is 0.86 Å.

Figure 4
figure 4

A Structural representation of the peptide-MHC complex generated by docking of the peptide FPLANVTCM from the alpha virus structural polyproteins (cyan) and the peptide KPLANVTLM from the alpha 1-B glycoprotein of human (red) into the binding groove of HLA-DRB1. Both the peptides exhibit similar fit and interactions with the amino acids (yellow) in the binding pocket of HLA.

The docked p-MHC complexes were analyzed for the hydrogen bonding interactions between the peptides and HLA-DRB1. The binding interactions between the viral and human peptide with HLA-DRB1 were visualized using LigPlot+. Post docking analysis of the peptides revealed significant interactions with HLA-DRB1. In viral peptide-HLA-DRB1 complex, ten hydrogen bonds within a distance of 3.5 Å suggested the stability of the peptide in the binding groove of the pocket. Similarly, a set of eleven hydrogen bonds were formed between human peptide and HLA-DRB1 within a distance of 3.5 Å. Furthermore, it can be seen in Fig. 5 that the amino acids in the conserved regions of both the peptides form identical hydrogen bonds with the amino acids (Ser51, Asn260, Lys249, Trp235, Asn67, Tyr 208) present in the binding pocket of HLA-DRB1.

Figure 5
figure 5

Analysis of molecular interactions of viral peptide docked onto HLA-DRB1 (top) and its homologous region in the human protein docked onto HLA-DRB1 (bottom).

Potential mimics could also act as B cell epitopes

To further strengthen the case of alpha viral induced molecular mimicry, we have analyzed the 24 conserved regions for the presence of both linear and structural B cell epitopes which in turn is a measure of their ability to act as stimulators of B cells and generate an autoimmune reaction against its homologous region in the human protein leading to polyarthralgia. Linear B cell epitope prediction was performed using three different prediction algorithms namely Bepipred linear epitope prediction, Emini surface accessibility scale and Kolaskar and Tongaonkar antigenicity scale. Prediction results are summarized in Table 3.

Table 3 Analysis of conserved regions of alpha viruses sharing homology with human peptides (refer Table 1) for their ability to act as B cell epitopes.

Out of the 24 conserved regions analyzed, 5 regions were predicted to be plausible B cell epitopes by all the three algorithms. 6 regions were predicted to be plausible B cell epitopes by at least 2 of the 3 algorithms tested. 12 regions were predicted to be B cell epitopes by only one of the algorithm. Only one out of the 24 conserved regions was not predicted as B cell epitope by all the three algorithms.

Interestingly, 9 out of the 24 conserved regions shared homology with human proteins which were experimentally proven to have antibodies developed against them in patients diagnosed with rheumatoid arthritis (highlighted in bold in Table 1).

To further validate if the potential mimics could act as plausible B cell epitopes also at a conformational level, CHIKV structural polyprotein was used as a model to locate conformational B cell epitopes using Ellipro server from IEDB. As shown in Fig. 6, the parts of the peptides MKSDASKFTHEKPEGYYNWHHGAVQ and YSGGRFTIPTGAGKPGDSGRPIFDN (labelled A in Fig. 6) present on the Capsid protein, the part of the peptide RKGKIHIPFPLANVTCMVPKA (labelled B), PTVTYGK (labelled C) and EGLEVTWGNN (labelled D) present on the p62 protein are exposed to the surface and predicted to be potential conformational B cell epitopes.

Figure 6
figure 6

Epitopes predicted based on the 3D structure of the Chikungunya virus structural polyprotein (PDB ID: 3J2W) using ElliPro. The predicted peptides are shown in sphere representation and are colored yellow.

Discussion

Viral infection as one of the etiological agents for autoimmune diseases is being discussed for a long time. Several mechanisms have been proposed to explain this phenomenon including molecular mimicry, bystander activation and viral persistence20 individually or in multiple combinations to account for the immunopathology observed at the site of infection and/or sites of autoimmune disease and normally the infections precede the occurrence of inflammation in the target organ21. Triggered immune response as result of infection is critical for viral clearance. However, in some instances, immune regulatory mechanisms may aberrate, leading to the breakdown of self-tolerance, resulting in immune-mediated attack directed against both viral and self-antigens as it happens in the case of molecular mimicry or shared homology between viral and host epitopes22,23.

Human infections caused by the alphaviruses group share a common feature of the clinical picture of arthralgia and chronic arthritis that closely resembles rheumatoid arthritis7,24,25. Further, more severe and delayed recovery of alpha viral disease in patients with pre-existing arthritic conditions has been reported26. Though the cellular components and inflammatory scenario involved in such conditions have been looked at27, knowledge about the actual cause of such persistent symptoms characterized by articular disease and myalgia remains sparse. Though one of the speculations regarding the basis for this rheumatic like manifestations is the possible induction of autoimmunity as a side effect of adaptive immune responses, caused by cross-reactivity between viral and host antigens24,28, there are no systematic studies towards proving this hypothesis.

In the present work, through an in silico analysis we have identified the presence of conserved regions in the structural polyprotein of alphaviruses. A high sequence identity (71%) suggests that the origin of commonality in the clinical features may lie in the amino acid sequence of the structural polyprotein of the viruses. On comparative analysis of these regions of consensus with the human proteome, we have identified proteins that share stretches of homologous regions with the structural polyprotein. Further, very interestingly we found that presence of auto antibodies specific to some of these proteins in rheumatoid arthritis (Table 1). Serum and synovial fluid from RA patients have been shown to contain auto antibodies specific to some of the proteins that we identified in our analysis. These proteins are highlighted in table-1. The underlined region of the peptide SKDVYANTQLVLQRPAAGTVHVPYS has been experimentally tested in mice to induce polyarthralgia. The mouse injected with this peptide recapitulated the pathology seen chikungunya infection29. These reports give further credibility to our findings and strengthen our current premise.

Our results further show that these regions are immuno-dominant stressing their ability for induction of potential auto-immune reaction. Based on the amino acid sequence analysis, twenty-four of the conserved regions (Table 2) bound to HLA class II alleles, suggesting the possibility of calling in for T helper cell responses. Three conserved regions were predicted to be potential B cell epitopes by all the algorithms tested. Since majority of the antigenic epitopes have been shown to have high surface accessibility, we hypothesized that identical amino acids would give rise to similar structural features too. Therefore, we validated the antigenicity of chosen conserved regions at structural level. The root mean square deviation between the peptides is 0.86 Å which suggests that the two peptides bind to class II MHC in a similar orientation (Fig. 4). Docking the conserved region (FPLANVTCM) from the structural polyprotein and its homologous sequence in the human proteome revealed similar interactions with HLA-DRB1 (Fig. 5). Plausible B cell epitope prediction analysis based on the structure of CHIKV structural polyprotein concurred with some of the conserved regions we identified. Thus, validation at both the amino acid and structural level affirmed the antigenic potential of conserved regions in the structural polyprotein. In addition, through the revelation of presence of significant conserved regions in the alphaviral proteins, we surmise that antibodies for differential diagnosis of alpha virus infections should be directed towards the non-conserved regions of the structural proteins for higher specificity. In summary (Fig. 7), our results computationally tease out a possible mechanism mediated through molecular mimicry leading to development of autoimmunity during alpha virus infections which could culminate in arthritis like conditions in infected individuals.

Figure 7
figure 7

Highly conserved peptide sequences present in the structural polyprotein of alphaviruses are antigenic and could act as a trigger for autoimmune reactions explaining one of the possible origins for polyarthritis post-arthritogenic alphaviral infections.

Methods

Viral protein sequences

We carried out the bioinformatic analysis using protein sequences available for the clinical isolates of alphaviruses known to cause polyarthralgia30,31,32,33 and used as protype strains in the literature29. Structural polyproteins of the six arthritogenic alphaviruses: Chikungunya virus (strain S2-African prototype)(UniProt ID: Q8JUX5), the Ross river virus (strain NB5092) (UniProt ID: P13890), the Semliki forest virus (UniProt ID: P03315), the Mayaro virus (strain Brazil) (UniProt ID: Q8QZ72), the O’nyong-nyong virus (strain SG650) (UniProt ID: O90369) and the Barmah forest virus (UniProt ID: P89946) were investigated for the presence of conserved regions. Amino acid sequences of these proteins were retrieved from the UniProt database34.

Multiple sequence alignment

Conserved regions within the structural polyproteins were identified by multiple sequence alignment using MUSCLE alignment algorithm35 in MEGA 7.036. A stretch of amino acids was scored as conserved region if it is present in at least four out of the six alpha viruses.

Homology search

BLASTp program (https://blast.ncbi.nlm.nih.gov/Blast.cgi)37 was employed to expound the existence of sequence homology between the conserved regions identified in the alpha viruses and the human proteome. The search set was limited to Homo sapiens (taxid: 9606) in the UniProtKB/Swiss-Prot database. Default BLASTp algorithm parameters were used and the results were limited only to the top 100 hits.

The homologous proteins list obtained from the BLAST search was manually curated using the Open Targets Platform server38 to identify if a given protein has been reported to be involved in the pathogenesis of rheumatoid arthritis. The Open Targets Platform server is a repository of human proteins and their involvement in diseases at various levels. These proteins bearing the homologous sequences to conserved regions within the structural polyproteins were chosen for further investigation.

Prediction of potential T cell epitopes

The shortlisted conserved regions in the structural polyprotein were explored for their ability to act as T cell epitopes. Peptides from these regions were subjected to HLA II binding analysis using ProPred II analysis tool39. The server uses quantitative matrices for predicting binding of nonamers to HLA class II. Quantitative matrices are chosen for predictive binding because they provide a linear model and are easy to implement. Nonameric peptides generated from the conserved regions, which were predicted as potential binders for HLA class II alleles were selected for further analysis. A frequency distribution of the ability of these peptides to bind to multiple HLA class II alleles was generated.

Peptide-MHC docking was performed using GalaxyPepDock40 server to evaluate the similarity in binding patterns among a conserved region in the alphavirus structural polyprotein and its corresponding homologous region in the human protein. GalaxyPepDock works by identifying templates from experimentally resolved structure databases to predict the protein structure, followed by an energy-based optimization to provide structural flexibility. Default parameters available with the tool have been used and the back-end data containing the parameters used for docking are not available. Crystal structure of HLA-DRB1 complexed with Type II collagen peptide (PDB ID: 6BIN) was retrieved from Protein Data Bank database41. The resident peptide in the crystal structure was stripped off using PyMol to make the binding groove of HLA-DRB1 available for docking the viral peptide and its human homologue.

Prediction of potential linear B cell epitopes

Shortlisted conserved regions from the structural polyprotein were explored for their ability to act as B cell epitopes. Full length amino acid sequence of structural polyprotein of Chikungunya virus (strain S27-African prototype) (UniProt ID: Q8JUX5) was subjected to linear B cell epitope prediction analysis available at Immune epitope database (IEDB)42. The analysis was performed using three different prediction algorithms namely, Bipipred, Emini surface accessibility and Kolaskar and Tongaonkar with default algorithm parameters.

Further, to validate if the conserved regions act as potential linear B cell epitopes at the structural level, we employed ElliPro server43 to predict the regions on the structural polyprotein of the Chikungunya virus (PDB ID: 3J2W). ElliPro accepts a PDB structure as input and uses three different algorithms based on protrusion index of the residues, protein shape approximation and neighboring residues clustering to predict linear antibody-based epitopes. Default parameters of the ElliPro program were used for prediction analysis.