Sequence signatures in envelope protein may determine whether flaviviruses produce hemorrhagic or encephalitic syndromes
We analyzed the envelope proteins in pathogenic flaviviruses to determine whether there are sequence signatures associated with the tendency of viruses to produce hemorrhagic disease (H-viruses) or encephalitis (E-viruses). We found that, at the position corresponding to the glycosylated Asn-67 in dengue virus, asparagine (Asn) occurs in all seven viral species that cause hemorrhagic disease in humans. Furthermore, Asn was extremely rare at position 67 in six flaviviruses that cause encephalitis, being replaced by Asp in four of them. Of the 3,246 sequences from H- and E-viruses, we found that 2,916 sequences (90%) contained Asn in position 67 for H-viruses or Asp in position 67 for E-viruses. The change from Asn-67 that is prevalent in H-viruses to Asp-67 (common in E-viruses) contributes to a stronger electrostatically negative surface in the E-viruses as compared to the H-viruses. These findings should help predicting the disease potential of emerging and re-emerging flaviviruses and understanding the relationship between protein structure and disease outcome.
KeywordsFlavivirus Envelope protein Hemorrhagic disease Encephalitis Sequence signature Dengue Yellow fever West Nile St. Louis encephalitis Tick-borne encephalitis Japanese encephalitis Kyasanur forest disease Omsk hemorrhagic fever
Flaviviruses are classified into three main groups: tick-borne flaviviruses, mosquito-borne flaviviruses, and flaviviruses with no known vector . Flaviviruses naturally pathogenic to humans are found among both the tick-borne and mosquito-borne groups. The major human pathologies caused by these viruses can be grouped into two types: (a) encephalitis and (b) hemorrhagic disease. We refer here to the corresponding viruses as E-viruses or H-viruses, respectively. Some H-viruses also elicit neurologic symptoms . In contrast, hemorrhagic symptoms in infections with E-viruses are reported rarely . On a phylogenetic tree derived from amino acid sequences, H-virus species are intermixed with closely related E-virus species and nonpathogenic species [4, 5], yet there are lineages among the major pathogens that are associated with more or less severe outcomes [6, 7]. Although there are some correlations between phylogeny and disease association, epidemiology, and clinical manifestations [4, 5, 6], the underlying genetic characteristics that determine infectivity, pathogenicity, and virulence are not well understood.
The envelope protein of flaviviruses is responsible for phenotypic and immunogenic properties of the virion and is believed to lead the virus entry into cells and, hence, has critical roles in pathogenesis and immune evasion . Virus binds to cell surface receptors and undergoes endocytosis. Acidification of the interior of the endosome induces an irreversible conformational change in the envelope protein that exposes its fusion domain and causes the envelope protein to transition from dimeric association to trimeric. Viral and vesicle membranes fuse and virus is released into the cytoplasm .
In flavivirus envelope proteins, 72 amino acid residues (out of ~500) are completely conserved in viruses producing either hemorrhagic disease (H-viruses) or encephalitis (E-viruses) . Since there is no suitable animal model for flaviviral hemorrhagic disease, it is difficult to study correlations between amino acid changes in the sequences of different viral strains and the ability of these strains to produce hemorrhagic disease. Therefore, we analyzed the nonconserved residues for amino acid signatures that are present in the envelope protein sequences of flaviviruses to identify residues that can be associated specifically to H-viruses or to E-viruses.
Materials and methods
Envelope proteins with their source viruses and taxonomic groups within the genus Flavivirus
Virus group and name
Mammalian tick-borne virus group
Tick-borne Powassan virusE
Kyasanur forest disease virusH
Alkhumra hemorrhagic fever virusH
Omsk hemorrhagic fever virusH
Tick-borne encephalitis virusE
Louping ill virusE
Dengue virus group
Dengue virus serotype 1H
Dengue virus serotype 2H
Dengue virus serotype 3H
Dengue virus serotype 4H
Japanese encephalitis virus group
Japanese encephalitis virusE
Murray Valley encephalitis virusE
St. Louis encephalitis virusE
West Nile virusE
Yellow fever virus group
Yellow fever virusH
Pairwise profile hidden-Markov-model (pHMM) sequence logos were created using the web tool available at http://www.sanger.ac.uk/Software/analysis/logomat-p . A pHMM specifies position-specific amino acid distributions and insertion and deletion probabilities to describe a sequence family, displaying colored, one-letter amino acid symbols in a stack representing each position. The stack heights represent the information content of each position and are determined by the deviation of the position’s amino acid frequencies from the background frequencies; the colors are based on chemical properties of the amino acids. A comprehensive review and details on HMM logos and the use of profile HMMs for visualizing sequence features in two groups of protein sequences can be found in the literature [14, 15].
Electrostatic distribution calculations were done using the Adaptive Poisson–Boltzmann Solver (APBS) with default parameters . Homology models of the post-fusion forms of Omsk hemorrhagic fever virus (OHFV, accession Q7T6D2), West Nile virus (WNV, Q91R02), and TBEV (P14336) were built with Swiss-PdbViewer  using the available post-fusion structures of dengue virus (DENV) and TBEV (PDB identifiers 1OK8 and 1URZ, respectively) and pre-fusion structures of WNV and TBEV as templates (PDB identifiers 2HG0 and ISVB, respectively).
Amino acid positions and domains mentioned herein are based on the dengue virus serotype 2 (DENV2) envelope protein sequence (P12823, range = 281–775; position 1 of envelope protein is position 281 of P12823 and position 67 of envelope protein is position 347 of P12823); protein accession numbers are from the UniProtKB . UniProt Knowledgebase Release 11.2 was used for scanning the Asn-67 site in all Flaviviridae sequences in the database. The sequence P12823 (envelope protein range = 304–421; ID = POLG_DEN2P) was used as a query to BLAST (using default cutoff)  against the 58,752 Flaviviridae sequences. This analysis resulted in ~4000 hits. The BLAST pairwise alignments were then scanned to determine the amino acid residues present in the position corresponding to Asn-67 in DENV2.
Our own phylogenetic analysis (not shown) using the genome polyproteins of flaviviruses listed in Table 1 supported previous observations that the mosquito-borne and tick-borne human flaviviruses cluster separately and that the ability to cause hemorrhagic or encephalitic disease is not monophyletic in origin [4, 5]. Both Asn and Asp occur at position 67 of the envelope protein among tick-borne and mosquito-borne flaviviruses and among pathogenic and nonpathogenic flaviviruses. Among three major branches of mosquito-borne viruses (dengue, YFV, and JEV), Asn-67 occurs in two branches (dengue and YFV) where viruses are associated with hemorrhagic disease, and Asn-67 does not occur on the third branch where several of the viruses (JEV, WNV, Kunjin, and Murray valley encephalitis) are associated with encephalitic disease. Among tick-borne viruses, one major branch includes both hemorrhagic (OHFV and KFDV) and encephalitogenic (TBEV) flaviviruses, and OHFV (an H-virus) is more closely related overall to TBEV (an E-virus) than to KFDV (another H-virus).
From phylogenetic analysis it is also evident that the distances between the DENV types justify treating them as separate species; therefore, we have omitted from this study 140 dengue virus sequences in UniProt for which the type was not specified. On the other hand, the available sequence data indicate that Alkhumra virus, louping ill virus, and Kunjin virus are properly considered subtypes of KFDV, TBEV, and WNV, respectively. Therefore, we did not treat Alkhumra virus, louping ill virus, and Kunjin virus as separate species. For analysis, we combined the sequence available for Alkhumra virus with KFDV, the sequences for louping ill virus with TBEV, and the sequence reported for Kunjin virus with WNV.
The prevalence of Asn at position 67 in all seven H-virus sequences initially analyzed, together with the biological role of this region (see “Discussion”), prompted more extensive study. Comparison of the three-dimensional molecular structures of the post-fusion forms of envelope proteins from an H-virus (DENV2) and an E-virus (TBEV) revealed that the orientation of this specific residue at position 67 is different in the two structures (Fig. 1c).
Examination of alignments of envelope protein sequences in UniProtKB from flaviviral species not represented in the initial set revealed that Asp and Asn are common amino acids at position 67, occurring also in flaviviruses that are not pathogenic or cause only mild symptoms in humans, for example, Asp-67 in Usutu virus , and Zika virus , and Asn-67 in Royal Farm virus and Saumarez Reef virus . Since we could not classify such viruses as either hemorrhagic or encephalitogenic, we limited further studies to the known human pathogens listed in Table 1. We tabulated amino acid types at position 67 and compared surface electrostatic distributions in the vicinity of residue 67 in E-viruses and in H-viruses, as electrostatic properties of a protein play a major role in defining protein function and the mechanisms of protein–protein and protein–ligand interactions [24, 25].
Amino acid composition at position 67 in pathogenic flaviviruses
The amino acids occurring at position 67 in flaviviruses associated either with hemorrhagic fever or encephalitis
Amino acid at positon 67c
Kyasanur forest disease
Omsk hemorrhagic fever
St. Louis encephalitis
Murray Valley encephalitis
Envelope protein position 67 is predominantly Asn in H-viruses
We observed Asn in position 67 of the envelope protein in 1,883 out of 3,246 sequences from pathogenic flaviviruses, occurring in 1,850 (93%) of 2,000 H-virus sequences and in only 33 E-virus sequences. These calculations exclude the 140 dengue sequences of unidentified type (all of which also have Asn-67). Of the 1,794 available DENV sequences, only 8 had Asp-67 instead of Asn-67, but the source of the clones could explain the absence of Asn (see “Discussion” below). Asn was found in 98–100% of sequences in five (DENV1–4 and KFDV) out of eight H-viruses. The most significant exception was yellow fever virus (YFV), in which 30% of sequences had Asn-67 and 70% had His-67. To statistically compensate for large differences between the numbers of sequences generally available for viruses that are serious public health problems (like DENV) and those that are less so (like KFDV), we performed statistical analysis of sequences with and without Asn-67 for each viral species. A Chi-square test from a 2 × 2 table (H-virus or E-virus vs. sum of percentages in each species with Asn site or not) showed that the association of Asn-67 with viruses that produce hemorrhagic syndrome is highly significant (Chi-square = 539.8 and P-value < 0.00001).
Envelope protein position 67 is usually Asp in E-viruses
Of 1,246 E-virus sequences examined, Asp-67 was found in 1,066 (86%), including 100% of Japanese encephalitis virus, 86% of TBEV (including its subtypes), and 99% of WNV/Kunjin as well as the single representative of Murray Valley encephalitis virus. Among E-viruses, the most common substitution was the chemically similar amino acid Glu (in 23 of the 1,246 sequences). The Glu substitution was characteristic of the louping ill subtypes of TBEV and also found in 3 of 620 WNV sequences analyzed as well as in the single sequence available from the Kunjin subtype. Of 139 TBEV envelope protein sequences not from louping ill subtypes, 135 (97%) have Asp-67, two have Gly-67, one has Glu-67, and one has Asn-67. St. Louis encephalitis virus was the only major E-virus that did not have Asp-67. A Chi-square test from a 2 × 2 table (E-virus or H-virus vs. having Asp site or not) indicated that the association between Asp-67 and E-virus was statistically significant (Chi-square = 543 and P-value < 0.00001).
Electrostatic charge distribution
We found a significant correlation between the identity of the residue at position 67 of the sequence of pathogenic flavivirus envelope protein and the nature of the disease caused by the virus. Since H-viruses and E-viruses are intermixed on the tick-borne branch of the flavivirus phylogenetic tree, phylogeny alone cannot explain the differential occurrence of amino acids at position 67. Out of 3,246 sequences from H- and E-viruses, 2,916 sequences (90%) contained Asn in position 67 for virus producing hemorrhagic syndrome or Asp in position 67 for virus producing encephalitic syndrome. Only two flaviviral species that have clear pathology and abundant sequence data did not fit the observed pattern. St. Louis encephalitis virus was the only major E-virus missing Asp-67, having the unique substitution Thr-67 in all 122 sequences surveyed. The only exception to the vast preponderance of Asn-67 in H-viruses was YFV, in which 59 sequences had Asn-67, typical of South American genotype I (found in Brazil) , and 140 had His-67. Asn-67 in those 59 YFV sequences is not part of a glycosylation motif. The glycosylation site corresponding to Asn-153 in dengue is also missing in YFV. Thus, neither envelope protein glycosylation nor presence of Asn at position 67 is essential for yellow fever virus to produce hemorrhagic symptoms. It must be mentioned that His, found in several YFV strains, is expected to contribute positive electrostatic distributions similar to Asn because of similarities in their properties. Conversely, the residues Asp and Glu occurring at position 67 in the E-viruses are both charged acidic residues that contribute to a similar electrostatically negative distribution around position 67.
Additional exceptions to the amino acid found preponderantly in position 67 fall into one or more of three categories: (1) isolated occurrence of an atypical amino acid within a species; (2) sequence data for a species are sparse; and (3) the virus rarely causes human disease. As noted in “Results” above, DENV1, WNV, and TBEV present examples of isolated atypical amino acids. Of 1,934 surveyed DENV sequences, 8 (all type 1) have Asp-67. Seven of these were isolated from Myanmar in 2000–2002, one from mosquito and six from infected patients, two of whom had hemorrhagic symptoms . However, in each case, other clones from the same patient had Asn-67, so the association of Asn with ability to produce hemorrhagic disease is not contradicted in these cases. The one other dengue sequence with Asp-67 has no publication or description of the source other than its isolation in 2002 in the Philippines. Well-annotated sequence data extracted from viruses isolated from individual patients without passage in cell culture or animal hosts—thus preventing the accumulation of additional sequence changes —will be critical to definitively establish or not a correlation between sequence changes and disease in those atypical cases.
Viruses for which data are sparse include Alkhumra, OHFV, and Powassan. The one available sequence of the Alkhumra genotype of KFDV [11, 28], which has been inappropriately called Alkhurma virus , underwent multiple serial passages during which mutation may have occurred. The recently reported  OHFV sequence (Q14F59) from strain Bogolubovka not only has Asp-67 but is unexpectedly over 3% different from complete envelope protein sequences described as being from strain “Bogoluvovska” (Q7T6D2, Q06061) and strain Kublin (Q6JJM0), which are more than 99% identical.
Because there are limited stocks of Powassan/deer tick virus, many of the 32 available sequences are from the same few original sources. In particular, Ebel et al.  obtained 11 of their samples from Kuno et al. . Louping ill virus, Kunjin virus, and Powassan virus rarely infect humans and do not produce epidemic outbreaks. However, severe Powassan encephalitis with varying fatality rates has been reported [6, 32]. This, together with the presence of Asn-67 in Powassan envelope protein sequences, suggests that Powassan virus may have the potential of causing hemorrhagic disease. This observation suggests that any virus with Asn-67, such as Powassan, should be assumed to be potentially capable of producing hemorrhagic disease, even if few reports of disease in humans have been accumulated. On the other hand, flaviviruses with Asp-67 may be more likely to be encephalitogenic should they become pathogenic. For a virus to infect humans it has to perform several complex steps, including replication in vector and host, transmission from vector to host, cell attachment, cell lysis, etc. Each of these steps is facilitated by different proteins and different motifs and domains within these proteins. Some viruses may have the genetic capability for some but not all of these processes. Such a virus could mutate and become virulent in humans. An examination of the identity of the amino acid in position 67 of the envelope protein may thus be more effective for predicting its disease potential than is its relationship to other pathogenic viruses.
Asn-67 is located in the relatively less-studied domain II of the envelope protein (domain III is the most studied because of its involvement in the virus–host–cell receptor interaction). In a previous computational analysis of flavivirus envelope proteins , we have shown that DENV Asn-67 is under negative selection pressure (thereby might be providing physiological advantage to the virus), is in one of the top five high-affinity MHC-II binding 9-mer peptides (Th-epitope), and is exposed in both the dimer (pre-fusion) and trimer (post-fusion). In DENV, less than 10% of the residues are exposed in both dimer and trimer. It is reasonable to conclude that Asn-67 confers an adaptive advantage to DENV and that changes in this amino acid are eliminated quickly. Indeed, mutants of DENV2 where Asn-67 in envelope protein was replaced by Gln did not grow in cultured mammalian cells [21, 33]. These mutants produced but did not release virus particles and, therefore, did not propagate in mammalian cells. The mouse-adapted Mochizuki strain of DENV1, which is not pathogenic for humans, has lost the Asn-67 glycosylation site because of a Thr-69 to Ile change .
A second glycosylation site, Asn-153, is found in dengue and most other flaviviruses in addition to glycosylation sites in the prM protein that could influence pathogenicity. Though missing the Asn-67 glycosylation site, more virulent strains of WNV are glycosylated at the Asn-153 equivalent site while the envelope proteins of less virulent strains are not [35, 36]. Recent studies on Japanese encephalitis virus have shown that a single glycosylation site in the prM protein is critical for viral biogenesis and pathogenicity in mice . Whether the missing glycosylation at position 67 in other flaviviruses is similarly compensated needs further investigation.
We do not know the biological advantage of Asn-67 in viruses where this position is not glycosylated (as in KFDV, YFV, OHFV, and Powassan virus). Both the variety of amino acids found among flaviviruses at this position and the study where Asp-67 was found among multiple samples cloned from individual dengue patients  indicate that changes at this position do occur. Therefore, selective pressures must be responsible for the uniformity of amino acid identity within most pathogenic flavivirus species. These pressures may or may not be related to disease manifestation; pathological propensity could be a “side effect”.
The striking electrostatic distribution differences between closely related H- and E-viruses may contribute to the phenotype since electrostatic forces influence the nature of interacting partners. We speculate that mechanisms adopted by H- and E-viruses are distinct and electrostatically controlled. The amino acid sequences from the two tick-borne viruses, TBEV (E-virus) and OHFV (H-virus) (P14336 and Q7T6D2, respectively) in our comparisons are 93% identical and, surprisingly, have distinct electrostatic distributions. In contrast, the two H-viruses, DENV2 and OHFV, although distant phylogenetically, have similar electrostatic distributions, with both having negative patches near Asn-67 (appearing red in Fig. 2).
We suggest that any flavivirus could potentially develop the ability to cause hemorrhagic or encephalitic symptoms. We found that flaviviruses produce either hemorrhagic syndrome or encephalitis in statistically significant correlation with the identity of the amino acid at the position corresponding to the glycosylated Asn-67 of DENV envelope protein. Asn-67 was highly favored in hemorrhagic fever viruses and Asp was highly favored in viruses that cause encephalitis. This association between disease outcome and amino acid in position 67 appears to be correlated with the electrostatic distribution in that region of the envelope protein. These findings should assist in predicting the disease potential of emerging and re-emerging flaviviruses and in understanding the relationship between protein structure and disease outcome.
This work was supported by the U.S. Department of Defense Chemical and Biological Defense program administered by the Defense Threat Reduction Agency and by In-House Laboratory Independent Research (ILIR) funds from the Research and Technology Directorate, Edgewood Chemical Biological Center, Research Development and Engineering Command, US Army. This research was performed while WCB held a National Research Council Research Associateship Award sponsored by the U.S. Army Edgewood Chemical Biological Center. We thank Dr. Hongzhan Huang and Dr. C.R. Vinayaka from Protein Information Resource for providing the statistical analysis and for designing Fig. 1c, respectively.
- 2.D.J. Gubler, G. Kuno, L. Markoff, in Fields Virology, ed. by D.M. Knipe, P.M. Howley, D.E. Griffin, R.A. Lamb, M.A. Martin, B. Roizman, S.E. Straus (Lippincott Williams and Wilkins, Philadelphia, 2007), p. 1153Google Scholar