Background

The development of computational methods to identify key host factors that allow viruses to interrupt and control healthy cell functions will greatly aid in the prediction of novel anti-viral drug targets [1]. Traditional systems biology approaches to understanding cell dynamics during infection include the creation of detailed kinetic models for intercellular signaling pathways. While these models are advantageous in understanding the disease state in a quantitative way, they require experimentally-derived or estimated parameters and training data [2,3,4], without which complications can arise and an accurate model can quickly become unattainable. Further, modeling studies are often limited to specific pathways which fails to consider the total cellular environment as an interdependent system.

Alternatively, network analysis methods applied to protein-protein interaction (PPI) data have been used to model cell-wide systemic changes associated with disease, changes in cell function, or cell fate [5]. This strategy provides a holistic understanding of system behavior by viewing proteins as interdependent states, regardless of specific interaction mechanisms, and allows for the exploration of cell level relationships. The field of network theory is well established. Several basic network metrics like degree (the number of interactions a protein is involved in) and betweenness (the importance of a protein to information flow through a network, or, how much of a bottleneck a protein is to system behavior) [6] are commonly used to describe the significance of network components in a wide range of applications [7,8,9]. These analyses have repeatedly revealed the importance of specific proteins within biological processes that cannot be found from traditional modeling approaches [10,11,12,13,14]. Disease networks have identified genes involved with cancer [15,16,17,18], demonstrated that the genes responsible for similar diseases are likely to interact with each other [19, 20], and predicted novel drug targets [21, 22].

There is precedent for network studies of many common viruses including hepatitis C [23, 24], severe acute respiratory syndrome (SARS) [19, 25], Human immunodeficiency virus (HIV) [25,26,27,28,29], and influenza virus [19, 30,31,32,33]. Past work studying the effects of influenza virus in PPI networks has focused on identifying host factors involved in virus replication and improving the prediction of drug targets but ends with an analysis of basic topological measurements. While this provides a general overview of the state of the network, it is a static snapshot of the cell and, therefore, fails to capture the dynamic nature of the cell. Therefore, the next logical step in analyzing biological networks lies in understanding how these dynamic systems can be manipulated and exploited to manage biological properties.

In classic control theory, controllability is the idea that a deterministic system can be driven to any final state in finite time given an external input [34]. This is commonly applied to linear, time invariant dynamic systems,

$$ \frac{dx(t)}{dt}= Ax(t)+ Bu(t) $$

where A is an NxN matrix of state coefficients that describes how N molecule states, x(t), interact within the system and B is a matrix of input weights describing how external influences, u(t), impact the system. In general, a system is controllable if the controllability matrix,

$$ C=\left[B, AB,{A}^2B,\dots, {A}^{N-1}B\right] $$

is full rank, N. This means that the system can be manipulated to reach any desired combination of states within all of state space following the defined input, B. In total, a controllability analysis identifies the key components of a system that must be manipulated to drive desired system outcomes [35].

An example PPI network in Fig. 1a is transformed into its state space matrix representation. With the inclusion of two independent inputs (u1 and u2), the controllability matrix is full rank. Therefore, the system is fully controllable and it is possible to drive the protein concentrations to any desired state. Applying the idea of controllability to a cell at the onset of viral infection, a virus aims to control cellular functions (the system of proteins), promote virus replication tasks, and reach a final infected cell state. While it would be advantageous to interpret the infection from this control perspective, mathematical limits due to large system dimensions prevent the direct application of traditional controllability methods to PPI networks.

Fig. 1
figure 1

a An example protein-protein interaction network with three proteins and two protein translation process inputs. The state space representation of the same network demonstrates that the change in state of a protein’s concentration is a function of its current state and an input process. A classic controllability analysis demonstrates that this system is fully controllable and could, therefore, be driven to any possible state change in every protein. b Example application of robust controllability, which determines the robustness of the network after the removal of a protein. c Example application of global controllability which assesses the importance of a protein to all methods of network control

Advances in network theory have created alternative methods of network controllability evaluation which survey each node’s (protein’s) importance in the ability of an external set of inputs to fully control the network. Controllability classification is founded in “driver node” calculations: identifying the network components which must be manipulated for the system to be fully controlled (analogous to determining the non-zero elements of the B matrix in classic controllability). Without manipulation, driver nodes will remain unaffected by changes to the rest of the system, rendering the total system uncontrollable. Driver nodes are identified using the Hopcroft-Karp algorithm [36] which can be applied to any directed graph in bipartite form. This method calculates the maximum matching of the graph, or, the largest set of network paths where no node is shared by two edges. Because each node can only influence one of its interactors, the identification of these paths dictates the way in which control can propagate through the network. The nodes that are not included in these paths or at the start of these paths are not receiving control from a neighboring node and, therefore, require “driving”. A set of driver nodes (size ND) that is capable of controlling the total network is called a minimum input set (MIS). The MIS is not unique and the number of possible MISs scales exponentially with the size of the network [37]. After a primary MIS is calculated, two methods of controllability node classification can be used.

In robust controllability (by Liu et al. [38], pictured in Fig. 1b), the MIS is re-calculated (size ND′) after removing each node from the network. The node is then classified by its effect on the manipulation required to control the network, where an increase in the size of the MIS makes it more difficult to control the network and a decrease in the size of the MIS makes it easier to control the network. The removal of: an indispensable node increases the number of driver nodes (ND′ > ND), a dispensable node decreases the number of driver nodes (ND′ < ND), and a neutral node has no effect on the number of driver nodes (ND′ = ND). This method has previously been applied to many network types such as gene regulatory networks, food webs, citation networks, and PPI networks to better understand what drives the dynamics of each system [29, 38]. While it is useful to observe the structural changes to the network after the removal of singular nodes, this method only considers one possible MIS. A second global controllability method by Jia et al. [39] (Pictured in Fig. 1c) classifies a node by its role across all possible MISs. A critical node is included in all possible MISs, an intermittent node is included in some possible MISs, and a redundant node is not included in any possible MISs. This method places each node in the broader context of all possible control configurations.

In total, this study aims to determine key host factors with regulatory roles specific to the influenza virus-infected cell state for the prediction of novel antiviral targets. We have completed a two-part controllability analysis of a host PPI network (HIN) and a hybrid network of human host PPI data combined with influenza A virus-host protein interaction data (VIN). The controllability characteristics of influenza virus interacting host proteins and driver proteins are compared to the characteristics of the total network. A set of 24 host factors that hold value topologically, in controllability, and functionally are identified as candidates for further study in drug development based on their specialized behavior during influenza infection.

Results

Topology of the host interaction network and virus integrated network

The directed PPI network from Vinayagam et al. [40] was restricted to confident interactions (see Methods for network construction details), creating a network containing 6281 proteins and 31,079 interactions. This network is referred to as the “Host Interaction Network” (HIN). Influenza A virus-host interactions from Watanabe et al. [41] were narrowed to 2592 directed interactions between 11 influenza A virus (IAV) proteins (HA, M1, M2, NA, NP, NS1, NS2, PA, PB1, PB2, and PB1-F2 proteins) and 752 “IAV interacting proteins” preexisting in the HIN. After integration into the HIN, the network contains 6292 proteins and 33,671 interactions. This network is referred to as the “Virus Integrated Network” (VIN).

Degree and betweenness calculations were completed for the HIN and VIN. As expected, the only proteins with altered degree after the addition of virus interactions to the network are the 752 IAV interacting proteins (Marked in blue in Fig. 2a). This shift is significant for the group of IAV interacting proteins as compared to all proteins in both the VIN (log scaled median of IAV interacting proteins: 1.04; log scaled median of all proteins: 0.70; student t-test of log scaled data p < 2.20 × 10− 16) and the HIN (log scaled median of IAV interacting proteins: 0.85; log scaled median of all proteins: 0.70; Student t-test of log scaled data p: 5.97 × 10− 12). The degree distributions of both networks are scale free (Additional file 1: Figure S1).

Fig. 2
figure 2

a Degree of the VIN vs degree of the HIN where the IAV interacting proteins are marked in blue. The degree distributions of the networks are scale free. b Difference in betweenness between the VIN and HIN for proteins which exhibit a difference greater than one

Because betweenness is sensitive to the information flow through all proteins instead of only neighboring proteins, 2735 proteins exhibit an increase in betweenness after the addition of IAV interactions. Of these proteins, 207 proteins’ log betweenness exhibits an increase of 2 or more in the VIN compared to the HIN (Fig. 2b). This suggests that the addition of IAV interactions has an effect on network topology that reaches over 3.5 times the number of host proteins that are directly interacting with IAV proteins. The betweenness shift in the group of IAV interacting host proteins is significant as compared to all proteins in both the VIN (Log scaled median of IAV interacting proteins: 3.23; Log scaled median of all proteins: 2.82; Student t-test of log scaled data p < 2.20 × 10− 16) and the HIN (Log scaled median of IAV interacting proteins 3.22; Log scaled median of all proteins: 2.82; Student t-test of log scaled data p: 2.13 × 10− 15). This is a result of being the limited protein set responsible for information flow from the viral proteins to the rest of the network.

Driver proteins

Driver proteins (nodes) are the foundation of both types of controllability calculations, representing the protein set which must be manipulated for the system to be fully controlled. The proteins are identified through maximum matching algorithms [36]. The HIN and VIN both require ND = 2463 driver proteins to achieve controllability, suggesting that the magnitude of network control is unchanged by the influence of the IAV interactions. However, the identity of driver proteins shifts slightly as the 11 viral proteins replace 11 host proteins within the primary MIS as drivers in the VIN. Table 1 lists the identities of the 11 host proteins along with the shortest distance to an IAV protein in the network, degree, and betweenness. Of these 11 proteins, only five are directly interacting with IAV proteins. One of the remaining proteins is two steps (two interactions and one connecting protein) from any IAV protein, and the remaining five proteins are three steps from any IAV protein. The number of paths between viral proteins and these proteins are reflective of the number of paths between viral proteins and all host proteins (Fisher test p: 0.99). This supports the idea that viral interactions have lasting effects on the system’s control structure, affecting proteins that are multiple paths away.

Table 1 Identities of the proteins that are drivers in the HIN but not the VIN with the shortest number of paths to an Influenza A viral protein. Degree and betweenness of the proteins of the VIN is provided (with the values from the HIN in parenthesis). Only 45% of these proteins are directly interacting with the viral proteins, demonstrating the cascade effect caused by the inclusion of viral interactions

Lastly, analysis finds that 8.9% of all driver proteins are also IAV interacting proteins, meaning the intersection of the two protein groups of interest comprise only 3.5% of the total network. There is a significant increase in the betweenness of driver proteins depending on their status as IAV interacting or IAV non-interacting proteins (Fisher test p < 2.2 × 10− 16) where there is no significant difference in degree of the same groups (Fisher test p: 0.7161). This is further evidence that the addition of virus interactions to the network magnifies information flow through the proteins most involved in controlling network behavior.

Robust controllability

Robust controllability was calculated (see Methods) for all proteins of the HIN and VIN (as shown in Table 2 with and without parentheses, respectively). The addition of IAV interactions to the network has no effect on the distribution of classifications of host proteins, and consequently, the IAV Interacting proteins. Upon entry to the VIN, the 11 IAV proteins are classified as neutral, meaning that removing these proteins does not alter the number of driver proteins required to control the VIN (ND = ND’). This reveals that the removal of singular proteins from the system is not enough to disturb the existing control structure under robust controllability.

Table 2 Robust controllability types of all proteins, driver proteins, and virus interacting proteins in the VIN (HIN in parenthesis)

While none of the proteins change robust classification between networks, the aforementioned replacement of 11 host driver proteins with viral proteins after the addition of virus interactions creates a small change in robust type distribution for driver proteins. Of the displaced host proteins (deemed “robust proteins”, found in Table 1), seven are neutral and four are dispensable in the HIN, meaning that their removal from the network does not change the number of driver proteins and reduces the number of driver proteins needed, respectively. All IAV proteins are classified as dispensable in the VIN. Of the five robust proteins that are both driver and IAV interacting proteins, four are neutral and one is dispensable. The most notable change in degree and betweenness between the HIN and VIN is PRMT5, with an increase of 9 and 2250, respectively. Overall, robust controllability results suggest that the HIN is stable against potential changes in the control structure that could be caused by the addition of IAV interactions.

We developed an analysis to test if IAV is selectively targeting host proteins based on controllability characteristics. 10,000 random sets of 752 proteins (the number of IAV interacting proteins) were pulled from the host proteins of the VIN. Their robust type distributions were plotted against the classification results of IAV interacting proteins, driver proteins, and all proteins in the VIN (Fig. 3a-c). The randomly sampled sets closely resemble all proteins of the network, not the true interacting protein set, suggesting that robust controllability behavior of interacting proteins is not a coincidence of network construction (one-sided p = 0.51, 0.49, and 0.50 for indispensable, neutral, and dispensable, respectively). IAV interacting proteins tend to be indispensable compared to the percentage of all proteins that are indispensable (Fig. 3a). This suggests that viruses prefer to interact with proteins that are vital to cellular control. Driver proteins are very likely to be dispensable proteins compared to the percent of all proteins that are dispensable (Fig. 3c). Further, the mean and median log degree and betweenness of the randomly sampled protein sets is significantly lower than the same measurements of the true IAV interacting set (p < 2.2 × 10− 16, 2.2 × 10− 16, Fig. 4), signifying that virus interacting proteins are in positions of network significance. Overall, the robust controllability results of IAV interacting proteins suggest that the virus may be selectively targeting host proteins based on controllability characteristics.

Fig. 3
figure 3

a-c Density plots of distribution of robust controllability type for 10,000 random pulls of 752 proteins (number of virus interacting proteins in network). d-f Density plots of distribution of global controllability type for 10,000 random pulls of 752 proteins (number of virus interacting proteins in network). Values for IAV interacting proteins (blue), driver proteins (green), and all proteins (gold) are pictured for all figures

Fig. 4
figure 4

Density plots of a) mean (blue) and median (green) log degree of random IAV interacting protein sets and b) mean (blue) and median (green) log betweenness of random IAV interacting protein. Values for the true IAV interaction set shown as vertical lines, evidence that host proteins that directly interact with viral proteins are in positions of network significance

Global controllability

Global controllability was calculated (see Methods) for all proteins of the HIN and VIN (as shown in Table 3 with and without parentheses, respectively). Unlike in robust controllability, there is a small disturbance to global type distributions of host proteins after the addition of virus interactions. 24 host proteins shift from being classified as critical (a member of all MISs) to intermittent (a member of some MISs) proteins. Identities of these proteins (deemed “global proteins”) can be found in Table 4 along with the shortest distance to an IAV protein in the network and protein degree and betweenness. The two most notable changes in degree and betweenness between the HIN and VIN are EPH receptor A2 (EPHA2) with an increase of 1 and 93, respectively, and transferrin receptor (TFRC), with an increase of 3 and 164, respectively. All 24 global proteins are driver and IAV interacting proteins which, as mentioned, only comprises 3.5% of the total network. There are only two proteins (EPHA2 and HNRNPA0) that are also members of the robust protein set. 45% of IAV interacting proteins are never drivers, suggesting that they are always manipulated by neighboring host proteins within any possible control configuration. IAV interacting proteins are not enriched for driver proteins (Fisher test p: 0.14).

Table 3 Global types of all proteins, driver proteins, and virus interacting proteins in the VIN (HIN in parenthesis)
Table 4 Identities of global Proteins (proteins that shift global classification between the HIN and VIN). All identified proteins are directly interacting with viral proteins. Degree and betweenness of the proteins of the VIN is provided (with the values from the HIN in parenthesis)

Again, a randomized protein set was created to test if IAV may be selectively interacting with host proteins based on their controllability characteristics. 10,000 random sets of 752 proteins (the number of IAV interacting proteins) were sampled from the host proteins of the VIN. Their global type distributions were plotted against the classification results of IAV interacting proteins, driver proteins, and all proteins in the VIN (Fig. 3d-f). As with the robust classification, the random sets closely resemble the total network (one-sided p = 0.50, 0.51, and 0.50 for critical, intermittent, and redundant, respectively). While there are no redundant driver proteins by definition, driver proteins are more likely to be intermittent proteins than critical proteins (Fig. 3d-e), where more than 75% of all driver proteins are missing from at least one possible MIS. This means the majority of possible driver proteins are able to be controlled by a neighboring protein in at least one MIS. IAV interacting proteins tend to be redundant compared to the total number of proteins that are redundant (Fig. 3f). This suggests that viruses prefer to interact with proteins that are part of existing control structures to receive input from neighboring proteins.

Overall, global calculations identify a set of proteins for consideration that are more important within the VIN than the HIN. This is demonstrated through a comparison of degree and betweenness for the identified robust and global driver sets in Fig. 5. Proteins identified in the robust analysis show little deviation in both degree (Fig. 5a) and betweenness (Fig. 5b) measures after the addition of virus-host interactions to the network. In contrast, proteins identified in the global analysis show much larger deviations in degree (Fig. 5a) and betweenness (Fig. 5b) with all proteins having a betweenness of 0 in the HIN with an up to two log unit increase in the VIN (Table 4). Because the identified proteins were not responsible for information flow until the addition of virus-host interactions to the network, this suggests that the global protein set may identify key regulators of host immune response to infection.

Fig. 5
figure 5

a) Degree and b) betweenness of robust (blue) and global (green) protein sets between the HIN and VIN. While proteins identified in the robust controllability analysis do not show significant deviation in degree or betweenness, proteins identified in the global controllability analysis show a shift in both measures after the addition of viral interactions

Validation of controllability significant host factors

All proteins were checked against 6 siRNA screens for host factors involved in influenza replication (Brass et al. [42], Hao et al. [43], Karlas et al. [44], König et al. [45], Shapira et al. [46], and Watanabe et al. [41]), grouped by both robust and global controllability classifications. Less than 5% of all classifications of both types are validated by any of the 6 screens (Fig. 6), suggesting that no controllability classification is more enriched for host factors than another. This is likely due to the low agreement observed across siRNA studies [47]. However, the driver proteins that change robust and global classification have higher hit rates in siRNA screens, with 2 of 11 changing MIS proteins (SF3B4, SRPK2, 18% validation) and 5 of 24 global-identified proteins (OSMR, PPA1, PSMA5, POLE4, GDI2, 21% validation), though neither are statistically significant results (Fisher p-values of 0.685 and 0.252, respectively).

Fig. 6
figure 6

Percent of each a) robust classification type and b) global classification type confirmed in 6 siRNA screens (Brass, Karlas, Shapira, Hao, Konig, Watanabe). None of the 6 possible classifications are more than 5% validated in the screenings, suggesting that experimental findings do not favor certain protein controllability types

An analysis of both protein sets of interest was performed using Ingenuity Pathway Analysis (IPA) [48]. The network created for the robust protein set identified cellular compromise, cell death, and cell cycle functions. The network created for the global protein set identified protein synthesis functions, all centered around NF-kB. The global network notably recognizes six proteins (EPHA2, FBL, PFKM, PSMA5, SSR1, and TFRC) for their involvement in the infection of cells (p: 9.58 × 10− 4). Four proteins in the robust network (CELF1, SF384, SRPK2, and HNRNPA0, the last of which appears in both protein sets) were identified for their involvement in mRNA processing (p-value: 3.33 × 10− 6).

Lastly, Interferome v2.01 [49] was used to determine if the 11 robust proteins and 24 global proteins are interferon regulated genes (IRGs). All 11 robust proteins are identified as IRGs and exhibit a 2-fold change in expression when treated with interferon in at least one experimental dataset. 20 of 24 global proteins are identified as IRGs and exhibit a 2-fold change in expression in at least one experimental dataset. 6 global proteins are identified in more than 10 studies. In particular, HNRNPA0 and PPA1 are significantly down regulated in 20 and 63 datasets, respectively. These results point toward the involvement of the predicted protein subsets in immune response events.

Discussion

A network representation of the cellular environment demonstrates that the effects of infection (represented by the addition of virus-host interactions) cascade through the system, demonstrated by the alteration of basic topology measures. The betweenness shift between the two networks, particularly in IAV interacting proteins, supplies evidence that the topological effect of viral infection is wide reaching (Tables 1 and 4). Further, a comparison of driver protein betweenness for those that are also IAV interacting proteins in comparison to those that are not shows a significant difference. Driver proteins that are IAV interacting are not receiving control influence from viral proteins (dictated by the maximum matching requirement that each protein only control a single protein) and require additional external influence to achieve network control. However, the increased betweenness of proteins that are both driver and IAV interacting proteins suggests that this group is still of great importance to information flow through the network. This is one example where differences in network topology measures can emphasize the importance of select proteins that are overlooked by controllability principles.

Controllability analyses confirm that IAV interacting proteins are in positions of significance for both types of classification. The increased population of indispensable IAV interacting proteins (robust controllability: ND′ > ND, Fig. 3a) compared to what would be expected by random chance suggests that it would be more difficult for an outside influence (such as viral infection) to control the network after removing the IAV interacting proteins opposed to a randomly selected protein. This is logical as IAV interacting proteins act as the connection between viral proteins and the host network where control is initiated. The increased population of redundant IAV interacting proteins (global controllability: never a driver protein, Fig. 3f) when compared to the random expectation indicates that more IAV interacting proteins are always being manipulated internally than would be expected by chance. This means that they are fully incorporated into the control structure of the VIN. From these two results, one can conclude that IAV interacting proteins contribute to both the “gate” (the ease of entering the system) and the “heart” (the proteins responsible for propagating control through the system) of the network control structure during infection. These findings support the idea that viruses are likely to interact with proteins which offer an advantage to total network control.

Similarly, both sets of controllability results demonstrate that driver proteins play interesting roles in the network control structure. The large population of dispensable driver proteins (robust controllability: ND′ < ND, Table 2) signifies that the majority of driver proteins are making it more difficult to control the network by requiring more external inputs to control system behavior. In their absence, the number of driver proteins would decrease and it would theoretically be easier for a viral attack to compromise the network control structure. As such, a possible strategy for drug development could be to protect these proteins from repression effects during infection. Over 75% of driver proteins are classified as intermittent (global controllability: sometimes a driver protein, Table 3), meaning there is at least one MIS where these driver proteins are not drivers, and receive control influence through internal propagation. This lends itself to the idea of viral escape routes: under pressure, virus proteins could utilize alternative pathways to maintain system control and reach the goal of hijacking cellular function.

The method of controllability implementation used identifies protein sets of interest through changes to classification between the HIN and VIN. Unfortunately, robust classification methods do not detect a change between the two networks in this study. As it is a measure of the robustness of the network to structural changes in the absence of each protein, this suggests that the HIN upholds its typical control structure during IAV infection. This could be a consequence of the interaction data used or it may be that the strategy applied here cannot distinguish between the behavior of healthy and diseased states. Knowing the extent of changes to cell behavior within immune response pathways [50,51,52], apoptosis signaling [53, 54], and transcriptional processes [55,56,57] during infection, the IAV infected cell can be interpreted as a different system. The failure to see this distinction may be a shortcoming of the robust controllability calculation, especially knowing that the 11 robust proteins are not unique due to the method’s use of a single MIS. Overall, the robust analysis should be applied to additional virus-host networks in the fashion described within this study to further evaluate the method.

The 24 proteins identified by the global controllability analysis show promise as indicators of regulatory roles specific to the infected state. All global proteins are IAV interacting and driver proteins, a high distinction which demonstrates a significant importance to network information flow marked by significantly higher betweenness in the VIN than even driver proteins that are not IAV interacting. Additionally, all global proteins have no importance to network flow in the HIN (betweenness = 0) (Table 4), suggesting their role in network structure “turns on” after the onset of infection. It is noteworthy that PRDX1 has been implicated in respiratory syncytial virus (RSV) [58], a lower respiratory tract infection that is often associated with influenza virus [59]. Though the number of global proteins identified in existing siRNA screening data is not statistically significant, it should be noted that siRNA screens cover only the partial genome. As such, this type of analysis could be used to direct future experimental studies to save time, money, and effort. IPA analysis reveals that some of the identified proteins hold roles in mRNA processing, an integral part of the influenza virus’ ability to spread through processing its own RNA using host machinery [60]. The global protein network is centered around NF-kB, which is implicated in host immunity with evidence that the virus directly inhibits NF-kB activity [61, 62]. The interferon regulating roles of proteins in a high number of both identified sets (all 11 changing MIS proteins and 20 of 24 global-identified proteins) speak to their responsibility in controlling infection. PPA1 appears as downregulated in 63 studies and HNRNPA0 appears as downregulated in 20 studies when treated with interferon compared to a control, solidifying their involvement in the host immune response. In total, this evidence suggests that controllability analyses hold power as predictors for important regulators of the host response to influenza infection and, therefore, hold power for drug target prediction.

Existing influenza virus studies using PPI networks require additional data such as differentially expressed gene information [63] or protein context [30] to construct host response networks. Alternative methods such as DeltaNet [64, 65] and ProTINA [66] utilize gene transcription profiles to infer protein drug targets, but rely on the accurate deduction of gene regulatory networks. More recent PPI studies have used network growing functions such as GeneMANIA, STRING, and IPA [67] to predict IAV host factors and studied infected cell systems through the integration of screening data with network methods [33, 68]. Approaches incorporating time course data into network analysis have also been explored [69]. While these methods (which include basic network metrics such as degree and betweenness of PPI networks) have been successful at identifying disease host factors and in drug target development in the existing body of work, this dual controllability study offers a novel, in-depth analysis of the role of individual proteins in the context of total system function and how possible changes to the system can be interpreted.

Lastly, though this study has used experimental data from Influenza A studies, this analysis can be used to improve the prediction of drug targets for any pathogen-host interaction given available protein interaction data because of the generality of the method. The limits of these methods lie in limited availability of large-scale, dependable databases of protein-protein interactions. Foundational maximum matching algorithms for the calculation of driver proteins must be performed with directed networks. While larger directed networks than the network from Vinayagam et al. [40] are available [70], the network used here contains only experimentally derived data opposed to computationally predicted interactions, assuring biological confidence in the results within this study. A robust controllability analysis of the computationally predicted network presented in Uhart et al. [70] finds that 29% of proteins are categorized as indispensable where approximately 20% of proteins in the Vinayagam network are classified as the same, though there is 89% overlap in directed edges between the two networks. This suggests that methods for predicting protein interactions may over represent these key proteins within the analysis, even in combination with experimental results. However, larger networks will move towards a more complete analysis of infected cell behavior and possibly reveal further proteins of interest. Therefore, the future of this field depends on continued establishment of large, confident, directed PPI networks.

Conclusions

In total, this two-part network controllability analysis for a host protein-protein interaction network (HIN) and an integrated influenza virus-host protein-protein interaction network (VIN) aims to enhance the prediction of antiviral drug targets for influenza A virus. While robust controllability methods have previously been applied to study PPI networks [29], past analysis focuses only on the classification of virus interacting proteins and does not evaluate before and after the addition of virus-host interactions to the network. A global controllability analysis has never been applied to PPI networks. The unique construction of the VIN includes experimentally-derived virus-host interaction data [41] which represents opportunities for the virus to manipulate host intracellular machinery using protein-protein interactions. Here, analysis of the transition between the healthy and infected network states and further investigation of virus interacting and driver proteins has identified 24 proteins as regulatory markers of the infected state. This protein set is noted for its characteristics in topology, controllability, and functional roles within the infected cell: results that are summarized in Table 5. Our workflow observes both the effect of structural changes to the network in the case of potential protein knock outs, as well as each protein’s role in all MISs, representing all possible ways of controlling the system. In combination, network approach and results provide deeper understanding of how changes to cell behavior at the onset of infection are able to occur through the work of a small set of viral proteins. Through understanding the system in this way, we present the possibility to “outsmart” viral attack by dismantling the control structure which allows the viral infection to take hold.

Table 5 Summary of results for proteins identified in the global controllability analysis

Methods

Protein-protein interaction network

The host protein-protein interaction network (from Vinayagam et al. [40]) is the combination of interactions identified in two or more repetitions of Y2H screens within the study and known, experimentally identified interactions from literature where interactions had been given direction using a naïve Bayesian predictor. After retrieving the network, a confidence level cutoff of 0.7 was used based on the correlation between confidence scores and biological relevance reported in Yu et al. [71]. This network is the HIN. Influenza A virus-host interactions detected by Co-IP RNAi assay in Watanabe et al. [41] were narrowed to interactions which contained host proteins already found within the HIN to avoid skewing degree and betweenness network metrics. All virus-host interactions are directed viral to host protein. These interactions were directly integrated into the host network, creating the VIN. All analysis was completed in R 3.4.3 using the igraph package.

Robust controllability classification

Calculations for robust classification were adopted from Liu et al. [38]. For a network of n nodes, a set of driver nodes for the bipartite representation of the network, ND, is found using a maximum matching algorithm such as Hopcroft-Karp [36]. Each node of the network is iteratively removed (N = N − 1) and maximum matching, ND, is reevaluated. Nodes are classified as indispensable (ND′ > ND), neutral (ND′ = ND), or dispensable (ND′ < ND).

Global controllability classification

Calculations for global classification were adopted from Jia et al. [39]. For a network of n nodes, a set of driver nodes for the bipartite representation of the network, ND, is found using a maximum matching algorithm such as Hopcroft-Karp [36]. For all ND, control adjacent nodes were identified iteratively and an input graph was created as dictated in Zhang et al. [72]. The input graph was used to classify nodes as critical (in all minimum input sets), neutral (in some minimum input sets), or redundant (in no minimum input sets).