Significant advances have been made in dissecting the etiology of autoimmunity over the past decade, both in identifying genetic risk factors, understanding the cellular basis for their effect and describing interplay of environmental components. Particularly, viral infections have been proposed to trigger these diseases on a background of genetic predisposition. Strong associations between the HLA region and autoimmune disease have been established over the past half century [27].
Moving association studies from single-locus analysis to wider gene interactions analysis seems crucial in the continued search for etiologies in complex human diseases. In present investigation, we focused on endogenous retroviral loci and their genetic interactions. Our aim in the present work was the analysis of polymorphisms in or nearby viral loci with at most two breaks in the reading frame of one or more viral protein, and their association with susceptibility to AID. The list of loci was originally based on NCBI 37.1 but was later supplemented with the loci HERV-K113 and HERV-K115 that are only present in certain humans. In each of the three tested diseases (MS, RA, and T1DM), at least one SNP withstood Bonferroni correction (rs391745, rs5993426, and rs7650483, respectively), suggesting that endogenous retrovirus may play a role in all three. Additional retroviral loci seemed to contribute, too (rs2435031, rs2096537, and rs11172544). Although we have observed some overlap, most loci occurred only in one particular disease. Thus, different endogenous retroviruses would conceivably determine the resulting disease phenotype.
Our further analysis took advantage of the fact that association terms and product terms generally are statistically independent; i.e., even if we optimize for association, we can only expect high interaction if there is a biological synergy. In MS, the three-way interaction of SNPs near HERV-Fc1 rs391745, HERV-K13 rs2435031, and HLA rs2135388 in particular suggests that the viral regions are relevant to this disease. Importantly, the statistical synergy indicates that the three loci together may contribute to MS. We are not simply observing additive effects, in which each locus influences MS separately. The actual mechanism by which HERV-Fc1 and HERV-K13 might influence the immune regulation of HLA and eventually cause MS development still remains to be elucidated. One possibility is that proviral elements, e.g., antigens mediating molecular mimicry effects, on a background of deregulated immune factors could trigger an autoimmune response. Recombination between retroviruses depends on the co-packing of different viral RNAs into the same particle [28]. Thus, some form of complementation is a prerequisite for recombination. HERV-Fc1 has the potential to express a full-length Env product of 584 aa, and a Gag product of 470aa. In silico analysis of predicted ORFs in the HERV-Fc1 sequence shows that the first two stop codons in the region of gag–pol boundary are separated by only 165 bp. Thus, if the first stop codon is indeed a premature termination signal in gag, this would cause deletion of the last 55 aa in the C-terminal of a protein with an original size of 526aa (unpublished data). It is uncertain whether this Gag corresponds to full-length or a premature termination product which lacks 55aa. The HERV-Fc1 pol frame is disrupted by several mutations, whereas HERV-K13 was included in this study based on its near-open pol gene, which is disrupted by only one frame shift. Interestingly, HERV-K13 also has an ORF for a 901 aa long Gag protein.
Similar genetic considerations apply to the observed associations and two-way interactions observed in the other diseases. In RA, we observed interaction of two SNPs near HERV-K encoding gag and pol (stop codon TGA/pro) and HERV-H encoding pol, rs5993426 and rs2096537, respectively. In T1DM, we have shown a synergy between two SNPs in a close proximity to human-specific full-length HERV-Ks loci, HERV-K106 and the HERV-K119. HERV-K106 has been identified as having the highest probability of being the youngest full-length endogenous retroviruses in the human genome with no sequence difference between its LTRs [29]. Moreover, the presence of the 293-bp env deletion in HERV-K106 which is characteristic in multiple HERV-K type I members suggests that this deletion may have been present in the infectious ancestral precursors of these viruses. A highly polymorphic HERV-K119 locus possesses intact gene components, which indicates that it has a potential to encode the functional proteins. Interestingly, Shin and others [29] suggested that the HERV-K119 pattern of polymorphisms is different from that of the other elements of HERV-K family (e.g., HERV-K113 or HERV-K115) and that the HERV-K119 locus exists as either a full-length HERV-K or a solitary LTR. In this case, a possible role for the HERV in autoimmunity with a special focus on T1DM is inadvertently linked to the presence of the ORFs in susceptible individuals. If so, then one would expect that the SNP is linked to the full-length or the truncated allele. Otherwise, the SNP might be linked to other genes or to the promoter activity of the HERV-K119 LTR, which scenario is the case remains to be determined.
ERVs can recombine to generate viruses with new infectious properties, as well established in mice, but also seen in other species. Our genetic data are currently exclusively based on genetic association and cannot as such predict any functional relationship beyond the presence or absence of SNPs. Nevertheless, statistical associations are most easily explained by a functional relation, and it is therefore pertinent to pursue the potential molecular interactions between viruses. As such, it is possible that HERV-K loci coding for full-length HERV-K genomes have all the features necessary for replication as a viral particle. Moreover, it is also plausible that complementation and/or recombination among the multiple full-length HERV-Ks proviruses or even members of others HERVs families in the human genome could lead to emergence of replication-capable viruses. Among 29 human-specific HERV-K insertion events, 17 are full-length human-specific insertions with all sequences required for HERV-K replication. Excluding gene conversion in the host genome, we speculate that recombination/complementation might occur during reverse transcription of two co-packaged RNA genomes. In fact as described, the rarity of complementation in trans among HERV families is surprising given that retroviral replication involves the obligate co-packaging of two viral mRNAs within the same viral particle [30]. We and others speculate that this is caused by a low probability that more than one element was expressed in the same cell at the same time. Several HERVs have been implicated in autoimmune disorders based on the presence of activated HERVs molecules.
An intriguing question remains about how endogenous retroviral recombinants could induce autoimmunity? Presumably in a similar manner to exogenous animal and human viruses, the mechanism is not understood. First of all, activity of viral regulatory regions could potentially affect expression of nearby elements such as immune regulatory genes, leading to aberrant regulation. Alternatively, expression of HERV genes themselves has the potential to activate the immune system. The innate immune system serves as the first defense mechanism during infection of exogenous viruses, via detection of pathogen-associated molecular patterns (PAMPs) through pattern recognition receptors (PRRs), activating expression cascade of proinflammatory cytokines. The envelope SU-domain of HERV-W has been shown to activate the PRR CD14 and Toll-like receptor (TLR) 4 with stimulation of interleukin (IL)-1β, IL-6, and TNFα [31], and HERV-K dUTPase can activate a number of ILs and interferon (IFN)-γ through TLR2 [32]. The proinflammatory response induced by the innate immune system also contributes to prime the adaptive immune system. The adaptive immune system may be activated by viral proteins acting as superantigens (SAgs), leading to massive non-specific T cell activation and cytokine release [33, 34]. We speculate that virus particles stimulate the innate immune system through the receptor proteins TRIMs, STING, and BST-2. The discovery of TRIM5 triggering of innate immune signaling upon binding the capsid of a nucleic acid-devoid retrovirus-like particles (RVLPs) and a similar induction of innate immune signaling following virion restriction by the host factor tetherin, could be particularly interesting in relation to our present study and our previous observations of genetic associations between TRIM5 and BST-2 markers and the occurrence of MS [35]. This could actually provide mechanisms by which activated HERVs are a source of such RVLPs. Perhaps, formation of particles is enough, and reproductive infection by the particles is not even necessary, as long as the expression is high enough. Later, the reaction might spill over into the adaptive immune system and cause it to respond to cellular components.
One possible caveat in our study is the fact that we have used SNPs as proxies for the viral loci and it is among the SNPs we have found statistical associations and interactions. Although the SNPs are very close to the loci, it remains a possibility that we are observing the effect of other genes close by. Specifically, there is a HERV-H-related sequence near HERV-K13, which could be relevant to MS. Additionally, none of the viral loci was recognized as disease relevant in high-density genome-wide association studies. SNPs located near or inside retroviral elements are poorly represented in the predesigned Illumina chips generally used for large-scale GWAS. These chips were deliberately designed to focus on single-copy sequences. Also, although in sheer numbers the GWAS have more strength than our data sets, the large number of comparisons made by GWAS necessitates large compensations for the multiplicity of testing. Therefore, we believe that an approach dedicated to investigating endogenous retroviruses such as ours has definite advantages.