Computational analysis of antigenic epitopes of avian influenza A (H7N9) viruses

Influenza virus can rapidly change its antigenicity, via mutation in the hemagglutinin (HA) protein, to evade host immunity. The emergence of the novel human-infecting avian H7N9 virus in China has caused widespread concern. However, evolution of the antigenicity of this virus is not well understood. Here, we inferred the antigenic epitopes of the HA protein from all H7 viruses, based on the five well-characterized HA epitopes of the human H3N2 virus. By comparing the two major H7 phylogenetic lineages, i.e., the Eurasian lineage and the North American lineage, we found that epitopes A and B are more frequently mutated in the Eurasian lineage, while epitopes B and C are more frequently mutated in the North American lineage. Furthermore, we found that the novel H7N9 virus (derived from the Eurasian lineage) isolated in China in the year 2013, contains six frequently mutated sites on epitopes that include site 135, which is located in the receptor binding domain. This indicates that the novel H7N9 virus that infects human may already have been subjected to gradual immune pressure and receptor-binding variation. Our results not only provide insights into the antigenic evolution of the H7 virus but may also help in the selection of suitable vaccine strains.

The avian influenza A virus of the H7 subtype is classified as a low pathogenic avian influenza (LPAI) virus and has caused sporadic human infections in recent years [1,2]. In the spring of 2013, an outbreak of the novel H7N9 virus occurred in a poultry in east China [3][4][5][6] that later caused human infections and deaths [4]. Based on two major sequential re-assortments with the H9N2 virus, we inferred that the virus originated from both wild birds and domestic poultry [7]. Recently, Lam et al. [8] analyzed H7N9 viruses that had emerged between December 2013 and April 2014, and found that H7N9 had spread from eastern to southern China, generating multiple distinct lineages. The rapid evo-lution of this virus caused widespread concern, due to its changing pathogenesis, host adaptation [9,10], and antigenicity.
Hemagglutinin (HA), the main antigen and major surface protein of the influenza A virus, is key to the process of infection caused by this virus. It is the primary target for neutralizing antibodies, which further inhibits attachment of the virus to the target cells and subsequent membrane fusion [11]. In order to evade surveillance by the host's immune system, the influenza virus has gained the ability to change the antigenic properties of HA through mutation or re-assortment of the corresponding gene. Usually, HA antigenic properties are determined by a group of residues clustered into regions called epitopes. More specifically, the HA epitopes are composed of amino acids that directly interact chinaXiv:201605.01327v1 with the neutralizing antibodies [12]. Thus, rapid mutation of HA epitopes could disrupt its interaction with antibodies, resulting in viral immune escape.
Given their critical role in influenza antigenic variation and strain selection for flu vaccines, many studies on influenza virus have focused on HA epitopes. Human H3N2 virus HA epitopes have been well characterized; they consist of five epitopes (AE) [13][14][15]. Additionally, by comparing the HA structures, the H3 HA epitopes were used to define the epitopes for H1 and H2 HA [16]. More recently, using computational integration, we have formed a comprehensive picture of HA epitopes of highly pathogenic avian H5N1 viruses [17].
In this study, by integrating epitope regions mapped from well-characterized H3N2 viruses, we inferred the antigenic epitope regions of H7, and analyzed mutations on five inferred epitopes for both lineages of H7 viruses. We further investigated the mutation frequencies of the different epitopes during the evolution of the two H7 lineages, and found that, although the mutations differ between the two lineages, epitope B changed frequently in both lineages. Furthermore, we identified new mutations that occurred in the five inferred epitopes of the 2013 H7N9 viruses, which showed significant changes in the antigenicity of this new H7N9 virus as compared to earlier H7N9 viruses. These findings yield insight into the antigenicity of H7N9 and may facilitate surveillance of influenza H7N9 and vaccine selection.

Data preparation
Full-length HA sequences of human influenza H3N2 viruses, avian influenza H7 viruses, and H7N9 viruses were obtained from the NCBI Influenza Virus Resource [18] on March 16, 2015. Laboratory-derived and identical HA sequences were removed. For each subtype, the HA sequences were aligned using MUSCLE software [19], and the alignments were checked manually. After alignment, the signal peptide regions were removed and the sequences with X character content exceeding 10% were eliminated. This resulted in the inclusion of 611 HA1 sequences for H7 and 3343 HA1 sequences for human H3N2. We obtained 467 and 87 HA1 sequences for avian and human-infecting H7N9 viruses, which were collected since the 2013 out-

Site entropy
Information entropy for each site was computed using the method described by Wiley and Skehel [15]. For each position, the information entropy was further normalized by the base entropy (natural logarithm of 20), which was computed by assuming even distribution of the 20 amino acids. To illustrate the differences among sites more clearly, we defined relative entropy, which was calculated as the ratio of information entropy for each site to the average entropy.

Mapping the epitope from H3 HA to H7 HA
The crystal structure of H7 HA (A/NETHERLANDS/219/ 2003(H7N7), PDB ID: 4DJ8) was aligned to that of H3 HA (strain A/AICHI/2/1968(H3N2), PDB ID: 5HMG) by using TM-align [20]. Then, the sites in H7 HA corresponding to those of the AE epitopes in H3 were defined as candidate antigenic sites for H7. Since the antigenic regions are thought to be exposed, only the surface residues of the H7 structure 4DJ8 were retained. A residue was identified as exposed when the Accessible Surface Area (ASA), which was calculated using the NACCESS program [21] based on the HA trimetric complex of 4DJ8, exceeded 1 Å 2 .

Construction of the phylogenetic tree
A phylogenetic tree of all H7 and H7N9 sequences was constructed by the neighbor-joining (NJ) method using the PHYLIP package [22].

Inferring five epitopes of H7 based on H3
Previous studies showed that influenza viruses with different HA subtypes may share similar antigenic structures [16,23,24]. Moreover, the HAs of H7 and H3 subtypes belonged to the same clade [25]. Our structural comparison showed that the HA structures of the H7 virus and the human H3N2 virus are quite similar, with a TM-score of 0.936, and a root-mean-square deviation (RMSD) of 1.66 ( Figure 1A). The TM-score and RSMD of H7 were 0.882 and 2.43, when compared to H1N1, 0.902 and 2.15 when compared to H2N2, and 0.886 and 2.26 when compared to H5N1. This suggested that H7 and H3N2 viruses shared a similar antigenic structure. Furthermore, we aligned the local structures of each epitope of H7 and H3N2 viruses, and found the RMSD for epitopes AE to be 1.20, 1.52, 1.07, 1.42, and 1.71, respectively. Based on the structural comparison of HAs, we mapped the five known epitopes of human H3N2 HA onto H7 HA (see Materials and methods). In this way, we inferred 130 antigenic sites for the H7 subtype: 18, 22, 27, 41, and 22 antigenic sites for epitopes A, B, C, D, and E respectively ( Figure S1).

Mutational profiles of HA from H7 Eurasian and North American lineages
As shown in the phylogenetic tree ( Figure 1B), sequences of H7 mainly clustered into two lineages, i.e., the Eurasian lineage and the North American lineage. Thus, we compared the similarities and differences of mutational profiles of HA1 among the H7 Eurasian lineage, the H7 North American lineage, and human H3N2 viruses (the latter was used as a reference). Mutational profiles were represented with relative entropies on all HA1 sites. As expected, the mutational profiles of HA1 for the H7 Eurasian lineage was significantly correlated with that of the H7 North American lineage, with a Pearson's correlation coefficient of 0.51 (P<2.2×10 16 , Figure 1C). This showed that HA1 of both lineages experienced similar selection pressures. Pearson's correlation coefficients between the human H3N2 and H7 lineages exceeded 0.3 (P<1.2×10 8 , Figure  1D and 1E), indicating that these were also significantly correlated.

Mutations of five antigenic epitopes of H7 Eurasian and North American lineages
Although the mutational profiles of HA of the H7 Eurasian and North American lineages presented significant similarity, some differences were also observed, especially for five antigenic epitopes ( Figure 1C1E). Furthermore, we compared the average entropies of the five antigenic epitopes, receptor-binding domains (RBD), and other sites of the two H7 lineages and human H3N2 viruses. Mutations in epitope D had relatively low frequencies, while mutations in epitope B had relatively high frequencies in both H7 and H3 viruses. Mutations in epitopes A and B of H7 Eurasian lineage were relatively higher than for the other three epitopes ( Figure  2B). As for the H7 North American lineage, epitope C showed extremely high levels of mutation rates, while epitopes A and D had a low level of mutation rates ( Figure  2C), which differed from the H7 Eurasian lineage. Furthermore, we found that the RBD sites of the H3N2 virus showed markedly higher frequencies of mutation than did the two H7 lineages. Furthermore, we attempted to identify sites with relatively high mutation rates in the receptor-binding domain. Those sites with entropies greater than the average entropy were regarded as high mutation-rate sites, and otherwise as low mutation-rate sites. In total, there were 15, 8, and 4 high mutation-rate sites for human H3N2, H7 Eurasian lineage, and H7 North American lineage viruses, respectively. The common high mutation-rate sites were positions 135, 158, and 193, while positions 98, 134, 153, 183, 194, 195, 224, and 228 (numbering in H3) were relatively conservative sites for the human H3N2 virus and the two H7 lineages (Figure 3).

Mutations in HA of H7N9 viruses isolated in China from 2013 to 2014 and in the previously reported avian H7N9 virus
Some sites in the RBD are highly conserved among avian viruses, while these sites bear significant substitutions in human viruses [26]. By comparing the human-infecting H7N9 viruses isolated in China between 2013 and 2014 to the previously reported avian H7N9 viruses, we discovered that most amino acids were conserved in the RBD, while positions 186, 189, and 226 differed (H3 numbering; Figure  2; Table 1). Substitution G186V had already reported to influence receptor-binding specificity in Eurasian H7 viruses [27], while Q226L had previously been reported to increase the binding affinity to the human receptor for H7 HA [28]. Some other mutations were also observed in other inferred antigenic sites, such as position 122 in epitope A, position 312 in epitope C, and positions 174 and 179 in epitope D (H3 numbering; Table 1). Furthermore, we compared human-infecting H7N9 viruses to avian H7N9 viruses isolated in mainland China from 2013 to 2014 (Table 1). Most sites were identical, except for position 57, which was located on epitope E. In human-infecting H7N9 viruses, the majority had an R at position 57, whereas most avian influenza H7N9 had K at this position.

Evolution of HA1 in H7N9 viruses during 2013 to 2014
Since the outbreak in 2013, avian H7N9 viruses have been circulating in China. To investigate the evolution of H7N9 during the past two years, we mapped the dynamic changes in the amino acids along the phylogenetic tree ( Figure 4A), and listed the sites with less than 95% conservation. Among the nine sites, sites 132 and 135 were on the inferred epitope A, site 312 was on the inferred epitope C, site 177 was on the inferred epitope D, and sites 57 and 59 were on the inferred epitope E, while site 135 was in the RBD. Further-more, we investigated the amino acid differences between human-infecting H7N9 viruses and avian H7N9 viruses ( Figure 4B). Avian H7N9 virus demonstrated mutation at all nine sites, while in human-infecting H7N9 viruses, sites 59, 114, 177, and 255 remained quite conserved. During the early stage of the H7N9 outbreak, most sites remained conserved during March to May of 2013. However, from Janu-  ary to March of 2014, all nine sites underwent significant mutation in the avian H7N9 viruses. Most altered amino acids in the human-infecting viruses were consistent with those in avian viruses, except for site 135, which had an S or T substitution, forming an N-linked glycosylation site. The A135T substitution of H7N9 had already been reported by Xu et al. [29], and had also been reported as a genetic marker for mammalian adaptation and virulence in other human-infecting avian influenza viruses, such as H10N8 [30].

Discussion
In this study, we inferred the antigenic epitopes of the H7 subtype influenza virus and analyzed the antigenicity in both H7 subtype lineages and human influenza H3N2 viruses. Five epitopes were mapped from the human H3N2 virus. H7 fell in the same clade as H3 in terms of HA classification, and is very likely to share a similar antigenic structure with H3. Moreover, the H3 HA epitopes were used to define the epitopes for H1 and H2 HA [16,23]. Thus, it was reasonable to infer the antigenic sites of H7 based on antigenic sites of human H3N2. There are two main lineages, the Eurasian and North American lineages, in the phylogenetic tree of H7 viruses ( Figure 1B). These two lineages showed different patterns in their antigenic sites. Epitopes A and B showed a high frequency of mutation in the Eurasian lineage, while epitopes B and C were frequently mutated in the North American lineage. Those results suggested that epitope B was immune-dominant in H7 viruses. Compared to the former H7N9 virus, the 2013 H7N9 virus circulating in China had particular mutations, and most of them were located in the inferred epitopes. Among these substitutions, some were non-conserved substitutions, such as T122A, E212R, and D174S, which suggested that these human-infecting H7N9 viruses may already have experienced gradual host immune pressure. Positions 186, 189, and 226 (in H3 numbering), located in the receptor-binding region, also demonstrated non-conserved substitutions (G186V, T189A, and Q226L), and the two substitution at positions 186 and 226 were reported to influence the receptor-binding activity of the H7 virus [27,28]. The latter may indicate that human-infecting H7N9 viruses have also experienced certain host binding variation.
We also compared dynamic changes in HA1 in human-infecting H7N9 viruses to avian influenza H7N9 viruses from 2013 to 2014, but found no significant differences between these two categories of viruses. For human-infecting H7N9, most amino acid changes were consistent with avian H7N9, supporting the continuous transmission from avian to human hosts. Interestingly, site 135 showed S and T substitutions in the human-infecting H7N9 virus, which resulted in formation of an N-linked glycosyla-tion site. Since genetic evolution is a continuous process, mutations in avian influenza H7N9 should continue to be monitored, and particularly at those sites in known epitopes and in the RBD, which may further cause human-to-human transmission.
In summary, we have inferred five antigenic epitopes for the H7 subtype influenza virus and have further analyzed the mutation patterns of each of these epitopes. We also identified new mutations in the 2013 H7N9 virus. Our findings will facilitate monitoring of antigenic changes of this new virus, and may enhance H7N9 surveillance and vaccine selection.