Influenza is one of the most well-known infectious diseases attracting attention worldwide. Seasonal influenza epidemics are the cause of over three million severe cases of illness and about 300,000 to 500,000 deaths yearly [1].There have also been four influenza pandemics since the 20th century, infecting millions of people and killing hundreds of thousands globally [2]. Because of this, influenza has been the subject of intensive research in the past century. While much knowledge regarding the virus has been discovered, we are still no closer to having the ability to predict the next pandemic, such as in the case of 2009 H1N1 pandemic. Current understanding of influenza zoonotic transmission potential of novel strains still remains poorly understood. This poses a significant threat to public health, not knowing when or where the next pandemic would strike.

A large number of influenza A viruses naturally reside in avian species where they constantly circulate and evolve. Most influenza A viruses are restricted to their host species, having limited capability to cross species barrier and infect a new host. It is not rare, however, for a virus strain to acquire the capability to make that zoonotic leap [3, 4]. This is highlighted by confirmed cases of human infections by highly pathogenic H5N1 viruses, and more recently, the H7N9 outbreak in China [5]. Analysis of the recent influenza H7N9 outbreak in China found the virus strain to be a reassortant from multiple mixed avian sources, causing infections via direct contact with poultry [6, 7]. Similar to H5N1 strains, this further affirms the potential of avian influenza strains capable of directly infecting human, causing severe illnesses.

Species barrier limits influenza strains from freely infecting different host organisms as they must overcome host range restriction to adapt to a new host. One crucial determinant of host tropism is hemagglutinin (HA) receptor specificity, in particular, preference of specific species of sialic acid on host cells. Human strains predominantly recognize α2,6-sialic acid linkages while avian strains preferentially binds receptors of α2,3-sialic acid linkages [810]. Studies in influenza receptor specificity have shown that specific amino acid substitutions can alter receptor binding site and binding specificity, which in turn, alters receptor preference [1113]. Another major determinant involves viral polymerase complex, more specifically, the PB2 subunit which has long been implicated in playing a crucial role in determining host tropism. A single amino acid residue in PB2 at position 627 was found to be sufficient to determine host range of influenza viruses [1416]. Glutamic acid is found at position 627 in most of the avian strains, whereas replacing the amino acid with lysine enables viral replication in humans [1722]. Furthermore, genomic signatures of both avian and human influenza viruses have also been explored by position-specific entropy profiles created by comparing both types of viruses [23]. Mutations on specific positions may render an avian strain capable of infecting humans. All these information play a part in further contributing to the understanding of host tropism of influenza viruses.

Information from the underlying molecular mechanism of host tropism would be useful in the construction of computational prediction models. A novel prediction model was first constructed by Qiang and Kou to discriminate between avian and human influenza A viruses based on molecular patterns in protein sequences [24]. The model employed a method based on wavelet packet decomposition transforming protein sequences into energy feature vectors for training an artificial neural network (ANN) model. Another recent prediction model constructed by Wang et al. made use of the avian and human genomic signatures discovered previously to also classify avian and human strains [25]. Position-specific entropy profiles of avian and human protein sequences were encoded with amino acid physicochemical properties and then trained with support vector machine (SVM). Both prediction models classify avian or human influenza strains based on compilation of six inner proteins of influenza A viruses, including one matrix protein (M1), nucleoprotein (NP), one non-structural protein (NS1), and three RNA polymerases (PA, PB1 and PB2). These prediction models could be of use in predicting interspecies transmission of influenza A viruses.

In our study, we further extended the prediction models to include all 11 influenza proteins for the prediction of host tropism. The 11 proteins include HA, neuraminidase (NA), NP, both matrix proteins (M1 and M2), both non-structural proteins (NS1 and NS2), as well as the rest of the viral polymerase proteins (PA, PB1, PB1-F2, and PB2). Prediction model for each individual influenza A protein was constructed to predict host tropism of the protein. In addition, a combined prediction model was also constructed using all 11 proteins for each strain. Similar to previous studies, the final model could classify between avian and human influenza A viruses from protein sequences, providing clues into the host range a novel influenza A strain might be predisposed to. This could be crucial in providing an early insight of novel strains capable of crossing species barrier, leading towards the prediction of interspecies transmission of influenza A viruses.


Influenza protein sequence dataset

A total of 67,940 influenza A protein sequences isolated from avian and human hosts were obtained from Influenza Research Database in February 2014 [26]. Incomplete and duplicate sequences were removed to minimize bias in the machine learning training process. Strains isolated from avian samples were classified as negative samples while human-isolated samples were classified positive. The protein datasets were further divided into separate training and testing datasets, by randomly allocating 20 percent of the sequences as testing datasets. Further details can be found in Table 1, which depicts the total number of samples used in the training and testing of each prediction model. An additional file lists the distribution of various influenza subtypes used in the training and testing dataset for each protein [see Additional file 1]. In summary, 20,923 positive human samples and 30,548 negative avian samples were used in the training of machine learning classification models as well as 5,262 positive human samples and 7,668 negative avian samples as testing datasets for external validation of the models.

Table 1 Total number of positive and negative samples for protein datasets and combined dataset.

The construction of a combined model utilized sequences of all proteins. Only strains with complete sequences from all 11 proteins were included in the training dataset. A total number of 3,272 positive human samples and 3,923 negative avian samples were used in the final training of a combined prediction model as well as 799 positive samples and 989 negative samples used as external testing dataset.

Transforming protein sequence into feature vectors

Composition of amino acids and amino acid physicochemical properties were extracted from protein sequences as feature vectors for the training of machine learning algorithms. Composition of each of the 20 standard amino acids was first computed, yielding 20 feature vectors. This was performed by calculating the frequency of each amino acid along the length of the entire protein sequence. These feature vectors represent the composition of each individual amino acid in the protein sequence.

The next step of transformation was performed using a method developed by Dubchak et. al., in which three descriptors: composition (C), transition (T), and distribution (D), were calculated to globally describe amino acid properties [27, 28]. The original four amino acid properties, hydrophobicity, normalized van der Waals volume, polarity, and polarizability were included, along with two other properties: charge and solvent accessibility. These amino acid properties divide amino acids into three groups based on amino acid indices by Tomii and Kanehisa [29]. The global descriptors, CTD, can be calculated using the following equations:

C = n 1 × 100 N , n 2 × 100 N , n 3 × 100 N
T = T G 1 G 2 × 100 N - 1 , T G 1 G 3 × 100 N - 1 , T G 2 G 3 × 100 N - 1
D = D 1 , D 2 , D 3 ,
D i = P i 0 × 100 N , P i 25 × 100 N , P i 50 × 100 N , P i 75 × 100 N , P i 100 × 100 N

Composition describes the percentage frequency of amino acid property groups within the sequence, while transition calculates the percentage of transits between amino acids of differing property groups and distribution, on the other hand, represents the percentage at which the first, 25%, 50%, 75% and 100% of amino acids of a particular property group within the sequence [3032]. The composition calculated in this step refers to the composition of each amino acid property group, instead of the 20 standard amino acids. Based on these, 21 global descriptors were calculated for each amino acid property. In full, 146 amino acid feature vectors represent protein sequences in the training of individual prediction models for the proteins.

Training machine learning classifiers

The first step of machine learning classification involves selecting the best algorithm most suited to classifying the datasets. Experiments on various machine learning classifiers were performed on the WEKA platform and machine learning algorithms taken into consideration were random forest, k-nearest neighbor (kNN), Naïve Bayes, support vector machines (SVM), and artificial neural networks (ANN) [33]. Preliminary training revealed random forest to be best suited for the training of the dataset. Random forest is an ensemble learning method containing a combination of decision tree classifiers. Random trees in the forest are grown through training of a bootstrapped sample in the dataset, and then by splitting leaf nodes in the trees using only a randomly selected subset of the entire feature space [34]. Random forest was chosen as the machine learning classifier to train all the prediction models.

All training of prediction models were conducted using 10-fold cross-validation. In 10-fold cross-validation, the entire dataset is divided into 9 training subsets and 1 testing subset. The training process would iterate 10 rounds using the 9 training subsets while reserving the last subset for testing. In this way, every sample in the dataset would be tested exactly once, to prevent the problem of overfitting.

Parameter optimization

Parameter optimization is an important step in the training of machine learning classifiers. To achieve the best performance, parameters for the classifiers must be fine-tuned so that the most appropriate parameters for the training dataset are chosen. For each model, parameter optimization was first carried out using grid search approach to select for the best parameters to train the final model. The random forest parameters tuned were number of trees and number of features used in the training. Grid search exhaustively applies every parameter in a manually specified subset to select for parameters achieving the best performance. However, this poses another problem of defining the maximum threshold for grid search to scour. This is because generally, as the number of tree grows, there would be more features for the model to consider from, and thus would be better for the classifier. Despite that, there is a threshold with which the increasing number of trees would bring no significant performance gain, but in fact only serves to increase computational burden [35]. In view of this, a maximum of 150 trees and 22 features were specified for the grid search approach. Parameters optimized for each prediction model is shown in Table 2, and prediction models were constructed with these optimized number of trees and features. The optimized parameters shown in Table 2 demonstrate that maximum number of trees and features are not necessary for best performance of the prediction models.

Table 2 Random forest optimized parameters.

Feature selection for combined model

In the combined prediction model comprising all 11 proteins, dimensionality reduction was applied to reduce the number of feature vectors for the training of machine learning classifiers. This was achieved by feature selection approach using variable importance method in random forest. As the method was not available on WEKA, this step was performed using the randomForest package developed by Liaw and Wiener in the statistical software R [36, 37]. The variables were ranked using mean decrease in Gini gain, which measures the quality of each variable split in the tree [34, 38]. The top 15 features for each protein were selected for inclusion as feature vectors into the dataset for the combined prediction model.

Performance model evaluation

Performance of prediction models were evaluated from a number of measures including prediction accuracy, sensitivity, specificity, area under the curve (AUC), as well as Matthew's correlation coefficient (MCC). Prediction accuracy measures of the overall accuracy of the classifier by calculating the number of correctly classified avian and human samples over the total number of samples in the dataset. Sensitivity and specificity summarize the accuracies of positive and negative predictions respectively where sensitivity calculates the ratio of samples correctly predicted among all positive human samples in the dataset and specificity describes the ratio of samples correctly predicted among all negative avian samples in the dataset. AUC on the other hand, gives the probability of correctly identified true positive samples over random noise in the dataset [39]. Lastly, MCC measures the correlation between observed and predicted samples of the binary classification.


Comparison of machine learning algorithms

There are many machine learning algorithms capable of classification problems, each with its own merits and limitations, and each suited to different kinds of dataset. To fully maximize prediction performance, a suite of machine learning classifiers was tested on the WEKA machine learning platform. Results in Table 3 show preliminary prediction performance of various machine learning classifiers trained on HA dataset, including random forest, Naïve Bayes, kNN, SVM, and ANN. All classifiers performed similarly well, suggesting clear demarcation between avian and human HA proteins. This further affirms the distinct receptor binding specificity of avian and human HA proteins. Nevertheless, random forest outcompeted all other classifiers, achieving 98.58% prediction accuracy (AUC = 0.996; MCC = 0.972), and hence was chosen as the classifier to train the remaining prediction models for individual proteins.

Table 3 Comparison of machine learning classifiers.

Performance evaluation of individual protein prediction models

After optimizing the parameters for each protein dataset, prediction models for 11 individual influenza proteins were then constructed with random forest. The performance results of 10-fold cross-validation training for each individual protein prediction model can be found in Table 4. All models were shown to achieve outstanding predictive performance, the lowest being NS2 model with 96.57% accuracy (AUC = 0.980; MCC = 0.916), while HA prediction model achieved the best predictive performance of 98.62% accuracy (AUC = 0.998; MCC = 0.972). The high performance of all cross-validation prediction models constructed diminishes the likelihood of overfitting which could decrease the models' ability to predict from novel protein sequences in the future.

Table 4 10-fold cross-validation performance on optimized parameters for prediction models.

The constructed prediction models were further independently validated with separate testing datasets and likewise performed well, as seen from Table 5. The lowest accuracy was that of M2 model at 97.09% (AUC = 0.993; MCC = 0.939) while the highest was achieved by HA model with 98.78% accuracy (AUC = 0.997; MCC = 0.976). The results further demonstrated the high predictive accuracy of all individual protein models, which reaffirm the models' ability for future prediction of host tropism of influenza proteins.

Table 5 Performance evaluation with separate testing dataset.

Selected features representing each protein dataset

In constructing the combined prediction model for prediction of influenza virus host, feature vectors from all 11 proteins were used. The consolidation of all amino acid physicochemical properties from each protein dataset would result in a complex high-dimensional feature space, possibly including redundant features. Thus, dimensionality reduction was achieved by feature selection approach, selecting the most relevant feature vectors for each protein. As such, a total of 165 feature vectors represent sequences of all 11 proteins in a virus strain.

In the transformation of protein sequences into feature vectors, each amino acid physicochemical property was represented by 11 global descriptors. Some amino acid property stood out in which several of its descriptors were selected in the top 15, signifying the importance of that property in determining host tropism for the particular protein. Table 6 lists the top amino acid properties for each protein dataset, with top properties having high mean decrease in Gini gain shown. Several proteins including HA, NA, NS1, PA, PB1 and PB2 seem to have dominant amino acid properties playing a major role in the classification of avian or human proteins.

Table 6 Top amino acid physicochemical properties identified using variable importance feature in random forest.

Performance evaluation of combined proteins model

The combined prediction model was constructed from top features representing each protein sequence. In contrast with previous models predicting individual protein host tropism, the final prediction model was constructed to predict influenza virus host given an assortment of proteins of mixed origins.

Performance of cross-validation training of the combined model is also shown in Table 4. Surprisingly, prediction accuracy of the final model surpassed all individual protein models, achieving 99.72% accuracy (AUC = 0.999; MCC = 0.994). A separate independent testing dataset further validated the performance of the model by correctly classifying 99.83% of test instances (AUC = 0.998; MCC = 0.997), shown in Table 5. It is therefore evident that the combined model incorporating features of all proteins resulted in an improved prediction performance.


All 11 influenza protein prediction models demonstrated high predictive performance, capable of distinguishing between avian and human influenza proteins. This suggests that apart from HA and PB2, the remaining nine influenza proteins also show clear distinctions in avian and human host tropism. However, the roles these remaining nine proteins play in determining host tropism are yet unclear. What is further unknown is that how many proteins it would take to tip the scale rendering an avian strain acquiring the capability to cross species barrier and infect humans. Further research would be needed to determine the role they play in host tropism. But this first step in constructing individual protein prediction models would come in useful for future work in directly predicting interspecies transmission of influenza virus.

Important amino acid physicochemical properties in host tropism

In the long evolutionary history of influenza, virus transfers between different host species allowed gene segments to be mixed, producing reassortant strains with both avian and human segments. This process might potentially enhance viral pathogenicity, allowing reassortant strains to adapt to new host species. Three of the four influenza pandemics that occurred since the 20th century have been shown to be generated from reassortment among avian and human strains [2, 25, 40, 41]. As different proteins may play a part in increasing or decreasing the species barrier for novel strains to cross, it would be beneficial to predict host tropism of each individual protein. This would aid in further understanding the complex interplay between various components in an influenza strain.

The feature selection process might have revealed important amino acid physicochemical properties determining host tropism of individual proteins. Interestingly, the properties charge, normalized Van der Waals volume and polarizability carry higher weightage compared to other properties in the classification of HA host tropism. The initial responsibility of overcoming host species barrier falls on HA which determines entry into host cells by binding to sialylated glycan receptors on cell surface [42]. As mentioned, mutations can alter receptor binding specificity which changes receptor preference. Studies looking into glycan receptor specificity have found that electrostatic charge has a role in influencing receptor binding dynamics between HA and receptors on host cells [43]. In general, HA is positively charged while glycan receptor on host cell is negatively charged [44]. Thus increasing or decreasing net charge of HA would alter electrostatic interactions which in turn affect binding affinity. This was demonstrated by studies which show that amino acid substitutions increasing or decreasing charge respectively enhance or reduce receptor binding affinity and avidity [45, 46]. Another study looking into molecular dynamics between HA and human receptor have found that mutations in HA affects the binding free energy involving electrostatic and non-polar interactions [47]. Polarizability, which concerns a molecule's ability to be polarized, would therefore play a part in determining the binding interaction of HA and human receptor. Further, glycan topology has also been thought to critically influence receptor binding of avian and human strains. Interaction between HA and glycan receptor were found to be influenced by electrostatic charge and Van der Waals volume, causing glycans to adopt distinct topological profiles [48]. These changes no doubt affect the binding of HA to glycan receptors on cell surface, demonstrating the importance of these selected amino acid properties in determining the switch in species-specificity.

Yet another heavily investigated influenza protein is PB2, where the two top amino acid properties chosen, charge and solvent accessibility, corroborate with previous molecular and protein structure studies. The crystal structure of an independently folded domain elucidated from PB2 revealed that the critical residue 627 is positioned in the middle of a surface exposed to solvent [17, 49]. The glutamic acid preferred by avian strains forms a negatively charged region which lysine disrupts and additionally establishes a region of positive charge on the surface [17, 49]. Mutations in the region therefore appear to affect polymerase activity. Evidently, both charge and solvent accessibility features play an important part in classification of host tropism by the PB2 prediction model.

While the roles other proteins play in determining host tropism are less well characterized, and studies looking into molecular dynamics of these proteins are even fewer, differences in amino acid residues of avian and human strains can be interpreted to changes in these amino acid properties. For the other two viral polymerases, PA and PB1, differences between avian and human sequences would inadvertently affect polymerase activity determining host range. How these changes play a part in host range determination is still unknown, but the amino acid properties hydrophobicity, polarity and solvent accessibility seem to suggest involvement of interaction with host proteins. Another protein is NS1, which is known to function as a potent viral antagonist of host interferon response [17, 50]. Studies have increasingly shown NS1 to be involved in host range determination, yet the mechanism remains unclear. However, it has been hypothesized that NS1 proteins from different strains have varying efficiency in interferon control, which could contribute to restriction in host range [50, 51]. Functional or structural variations would therefore also translate to difference in the charge of the protein, distinguishing avian and human NS1 proteins. Therefore, even minute change in protein structure and function could tip the balance towards adaptation in avian or human host. Further molecular studies are therefore needed to investigate how these properties play a part in host tropism of influenza viruses.

Computational prediction models of influenza host

The previous two computational prediction models by Qiang and Kou [24] and Wang et. al. [25] were successful in the classification of avian and human strains. One drawback however, is the utilization of only six inner proteins of influenza. This method disregards the importance of the two influenza glycoproteins, HA and NA which unquestionably play a huge role in determining host tropism. Removing them from consideration in the prediction process would not represent an accurate tropism of the entire strain. In contrast, the combined prediction model constructed in this study applies information from all 11 influenza proteins allowing a much more balanced representation of the virus strain. While the final prediction model is still short of directly predicting interspecies transmission of influenza viruses, it could provide an early insight into the host range a virus strain might be adapted to. The prediction models were implemented on a web server and are available for prediction online at

Protein prediction models as host tropism prediction system

Together, all 11 protein prediction models can be used as a host tropism prediction system. The prediction system is demonstrated below with eight selected sample strains, detailed in table 7. Four avian strains as well as four human strains of various influenza subtypes were selected. It should be noted that these strains were manually selected and meticulously checked against the training and testing datasets to ensure that they were not used in the construction of the prediction models nor the independent testing stage. Hence, the prediction results were not biased in any way as all eight strains were novel and not previously encountered by the prediction models.

Table 7 Further information on sample strains used in the demonstration of host tropism prediction system.

The host tropism prediction results for all eight strains are illustrated in Figure 1. Predictions for two human strains, A/New York/231/2003 and A/Guangdong/ST798/2008 were made accurately for all 11 proteins. These two strains are of influenza subtypes common to human, periodically circulating worldwide and infecting humans during annual flu season [4]. Hence, all of their proteins have adapted well in humans and were correctly predicted by the system. Likewise, accurate predictions for all 11 proteins were also made for two avian strains, A/turkey/England/50-92/1991 and A/wild duck/Korea/SH19-50/2010. The two avian strains of subtypes H5N1 and H7N9 were isolated from turkey and wild duck before the occurrence of these subtypes in humans. All 11 of their proteins were clearly avian proteins which again, were correctly predicted by the system. Prediction results for these two avian and human strains demonstrate the high accuracy of the host tropism prediction system, where despite making each protein prediction independently and not being influenced by other predictions, it is able to classify all proteins in each strain correctly.

Figure 1
figure 1

Host tropism prediction results for sample strains. The results for four avian strains are shown at the top while the bottom half shows results for four human strains. The prediction results were strung together illustrating an entire influenza A genome with eight segments encoding 11 proteins. The proteins coded by the segment are listed at the bottom of the figure. Each protein prediction is independent and is not influenced by prediction of other proteins. Blue bars represent a prediction of avian by the corresponding protein prediction model while red bars represent a prediction result of human. Grey bars indicate that prediction was not made as the corresponding protein sequence was not available or incomplete. Accurate predictions were made for all 11 proteins for the first two avian strains as well as the final two human strains. However, prediction results for the remaining four strains from the 1997 H5N1 outbreak in Hong Kong and the 2013 H7N9 outbreak in China show mixed predictions of avian and human proteins. The human strains isolated during the two outbreaks showing some of its proteins predicted as avian indicate the source of infection as most likely avian. On the other hand, the avian strains from chickens during the two outbreaks have several proteins that were predicted human and suggest that these proteins could have adapted to human host.

However, the prediction results for the remaining four strains showed mixed avian and human prediction results. This could be attributed to error made in the prediction, or it could shed a whole new light on the capability of the host tropism prediction system. In 1997, the first human infection of influenza subtype H5N1 occurred in Hong Kong [52]. A/Hong Kong/542/92 was one of the strains isolated from a human patient during the outbreak. The prediction results showed most of its proteins predicted correctly as human, except HA and NA, which were predicted to be avian. This indicated that the virus strain most probably originated from an avian source as not all of its proteins have adapted to human. Prediction results of an avian strain, A/Chicken/Hong Kong/220/97 isolated from chicken during the same period also showed both avian and human proteins in the strain. The prediction results suggest that three of its proteins (NP, NA and M2) have either reassorted or mutated sufficiently to have adapted to humans, and more closely resemble human proteins.

In another case of novel human infection by an influenza subtype previously only found in birds, the prediction results for avian and human strains during the 2013 H7N9 outbreak in China also show similar mixed proteins host tropism signature. An avian strain isolated from chicken during the outbreak showed as many as six (PA, NA, M1, M2, NS1 and NS2) out of 11 proteins predicted to be human instead of the correct avian classification. It is more likely for the six proteins to resemble more closely to human proteins that the system was able to detect instead of prediction error, especially considering that the strain was isolated during the outbreak. A human strain from Shanghai, A/Shanghai/01/2014 isolated during the outbreak had five of its proteins predicted to be avian (PB1, PB1-F2, PA, HA, and NS1). This seem to suggest that only five human proteins, PB1, PB1-F2, PA, HA and NS1 were sufficient for the strain to have acquired the capability to escape its primary avian host and successfully infect humans. While it is still unclear which proteins are critical for an avian strain to acquire zoonotic capability and infect a new human host, this result could pave way for future work to perhaps answer the question as to how many proteins are required for a zoonotic avian strain to infect humans.

Ironically, the strength of this host tropism prediction system lies not in its almost perfect prediction accuracy, but rather when it makes a mistake in classifying avian or human proteins. The different types of prediction results shown above demonstrate the capability of the prediction system to detect potential zoonotic avian strains. Prediction results for strains isolated during the 1997 H5N1 Hong Kong outbreak and 2013 H7N9 China outbreak showed predictions of mixed avian and human proteins, having distinct host tropism protein signature that distinguishes them from typical avian or human strains. The prediction system is able to detect when individual proteins within a strain differ from its primary host. This showed tremendous potential of the prediction system in influenza surveillance whereby influenza strains can be continuously monitored to detect potentially zoonotic strains in avian that has yet to emerge in humans.


The prediction models constructed for all influenza proteins show that besides HA and PB2 which are thought to be major determinants of host tropism, clear distinctions distinguish avian and human tropism for the remaining nine proteins. This study provides individual prediction models for all 11 influenza proteins determining host tropism and weighing the contribution of each protein in the eventual judgment of a novel virus strain's capability to cross species barrier. The prediction model combining all 11 proteins provides a first insight into a virus strain's host tropism, which might be useful as an early warning of its host range capability. When the prediction models are used together as a host tropism prediction system, zoonotic strains displayed mixed avian and human protein prediction results, distinct from typical avian or human strains. Based on protein prediction results alone, the host tropism prediction system might be able to identify zoonotic strains. Only by first understanding the underlying host tropism can promiscuous virus strains having the capability to cross species barrier be identified. With this foundation of host tropism prediction models, future work can be focused on building stronger computational models predicting direct avian-to-human transmission of influenza viruses. This would be a valuable tool in future surveillance of potentially hazardous influenza virus strains.