Background

Influenza A virus (IAV) infection is a worldwide public health threat [1, 2]. IAV causes respiratory tract infections and leads to inflammatory responses. Controlling the inflammatory response resulting from an IAV infection is of great significance in reducing associated tissue damage. However, many biological experiments have demonstrated that IAV infection-induced inflammatory responses are extremely complicated and regulated by dynamic networks [35]. Specific biological experiments investigating the mechanisms of interactions among individual inflammatory factors have not provided a sufficiently detailed and insightful multidimensional view of inflammatory regulatory networks (IRNs). We need to investigate the mechanisms at a system-level and from the network dynamics. Therefore, the construction of large and cell-specific inflammatory regulatory networks (IRNs) based on high-throughput data is essential for investigating the molecular mechanisms of inflammatory responses during IAV infection.

Biological experiments have found that IAVs induce the expression of a number of inflammatory molecules and inflammatory cytokines and chemokines, such as IL27, IL32, IL6, TNF, IFNG, CXCL10, CCL3, NOS2 and IL8 [69]. Furthermore, a number of studies have shown that the H5N1 viruses can induce increased gene transcription of pro-inflammatory cytokines, including CXCL10, IFN-β, IL6, COX-2 (Cyclooxygenase-2) and CCL5 [912]. In particular, COX-2 is the primary mediator in protection against IAV infection [4] and has been shown to play a regulatory role in the induction of the H5N1-mediated pro-inflammatory cascade [10, 11]. It is important to further investigate the mechanisms of the inflammatory cascade downstream of COX-2 regulation that may be involved in H5N1 infection [13]. To our best knowledge, the studies on constructing a cell-specific IRN after IAV infection are limited, and an integrated and systematic analysis of the inflammatory cascade mediated by COX-2 that incorporates microarray data has not yet been reported.

A number of different methods for inferring gene regulatory networks (GRNs) from high-throughput data have been proposed [1420]. However, there are a few studies on the construction of dynamic signaling networks based on stoichiometric approaches, discrete Boolean models, the fuzzy logic models, the integer programming method and the ordinary differential equation (ODE)-based method [15, 2127]. No study has reported combining a prior knowledge of network topology with nonlinear optimization algorithms to identify the dynamic regulatory network. In the process of reconstructing networks from expression data based on a priori knowledge of network topology, the most important steps are converting familiar network maps into mathematical models and fitting the available data into the network’s structural parameters. Recently, the rough topological structure of inflammatory networks with 2361 nodes and 63276 edges in humans have been obtained, which provides a prelude to more detailed network analysis and mathematical modeling for an inflammatory network [28]. By combining information theory-based MI and nonlinear ODE-based optimization, in this study, we proposed a computational method to construct a cell-specific IRN mediated by COX-2 during IAV infection. A differential evolution (DE) algorithm was used to optimize the network so that it best fits the experimental data. Furthermore, we performed a Kyoto Encyclopaedia of Genes and Genomes (KEGG) pathway and gene ontology (GO) terms enrichment analysis on the optimized IRN to identify the underlying mechanisms during IAV infection.

Methods

The flowchart of our work is presented in Figure 1 and mainly consists of six steps: constructing an initial IRN, simplifying the initial network, building a mathematical model, estimating parameters in the model with the DE algorithm, significance test and sensitivity analysis for the regulations, and performing an enrichment analysis.

Figure 1
figure 1

The flowchart of our work.

Data collection and construction of the initial inflammatory regulatory network

To construct a cell-specific IRN and investigate the mechanisms of the inflammatory cascade mediated by COX-2 in IAV infection, we selected 59 proteins, which are listed in Table 1, that are associated with the inflammatory responses regulated by COX-2 based on the published literature [3, 9, 13, 29, 30]. The microarray data were retrieved from the Gene Expression Omnibus (GEO) database under the GEO accession number GSE28166 [31, 32]. This dataset contains 36 samples in total, with 3 mock and infected replicates for each time point. In this study, the expression levels of complexes were the average of the gene expression levels of the members of the complex from the dataset. The expression level of IAV was obtained from the literature [32].

Table 1 Total 59 proteins selected from the published literatures

Network construction based on these 59 proteins was performed using Ingenuity Pathway Analysis (IPA) software (Ingenuity Systems, http://www.ingenuity.com ). The official symbol of each protein was imported into the IPA software. Through IPA analysis, we identified a total of 7 networks based on functional connectivity. Three of these networks shared common proteins, and it was possible to generate a merged network (data not shown). The merged network is very complicated and includes a few proteins that are not on our protein list. We pruned the network by removing the proteins that were not on our list except for some common and important molecules, such as NFκB, IL12 (complex), p38 MAPK, JAK, STAT, IFN-α and IFN-β. In addition, we integrated two molecules, IAV and COX-2, into the merged network. IL32, IL29, IL27, IL1β and IFN-α/β/γ have been reported to inhibit viral replication [3, 3337]. Therefore, we obtained our initial IRN comprising 51 proteins (or complexes) and 198 interactions. The network is depicted in Additional file 1. The full name of each protein in the initial IRN is listed in Additional file 2.

Simplifying the initial IRN based on conditional mutual information

PCA-CMI, which was originally proposed by Zhang et al. for inferring GRNs from gene expression data, considers the nonlinear dependence and topological structure of GRNs by employing a path consistency algorithm (PCA) based on conditional mutual information (CMI) [20]. In this study, we used the PCA-CMI method to distinguish direct (or causal) interactions from indirect associations.

For a discrete variable X, the entropy H(X) is the measure of average uncertainty of variable X and can be defined by:

H X = x X p x log p x ,
(1)

where p(x) is the probability of each discrete value x in X.

Mutual information (MI) measures the dependency between two variables (genes or proteins). For discrete variables X and Y, MI is defined by the following equation:

I X , Y = x X , y Y p x , y log p x , y p x p y .
(2)

CMI measures conditional dependency between two variables given other variable(s). The CMI of variables X and Y given Z is defined as:

I X , Y | Z = x X , y Y , z Z p x , y , z log p x , y | z p x | z p y | z .
(3)

With the widely adopted Gaussian kernel probability density estimator, the equations (1), (2) and (3) can be easily calculated using the following equivalent equations [15, 20].

H X = 1 2 log 2 πe n C ,
(4)
I X , Y = 1 2 log C X · C Y C X , Y ,
(5)
I X , Y | Z = 1 2 log C X , Z · C Y , Z C Z · C X , Y , Z ,
(6)

where C is the covariance matrix of variable X, |C | is the determinant of matrix C, and n is the number of variables in C.

A high MI value indicates that there is a close relationship between the variables, while a low MI value implies variable independence. Similarly, a high CMI indicates that there is a close relationship between the variables X and Y given variable Z, while a low CMI value represents independence between genes. If the MI or CMI value is smaller than a given threshold θ, the edge between the two proteins is deleted for the independence (See the detailed procedure of PCA-CMI in [20]).

We simplified the initial IRN based on PCA-CMI. We deleted the edges of the initial IRN with independent correlations recursively (i.e., from low to high order of independent correlation until no edge can be deleted). The threshold value θ of MI and CMI was set to 0.1, which was statistically tested by Z-statistic [15, 20, 38] (Figure 2). The simplified network, which contains 50 nodes and 142 directed edges, is depicted in Figure 3.

Figure 2
figure 2

The significance test for the selected threshold value of the MI and CMI. The x-axis is the Z-statistic values of the deleted edges when the threshold value of MI and CMI is set to be 0.1. The y-axis represents the number of edges whose Z-statistic fall into the corresponding bins. The blue dashed line is the inverse cumulative distribution function of N(0,1) when using significant level α=0.1.

Figure 3
figure 3

The simplified inflammatory regulatory network based on mutual information. The lines ended with arrows and bars denote positive and negative regulatory interactions, respectively. The lines without arrows or bars represent binding interactions.

Mathematical model of the network

To further obtain the more simplified IRN, we built nonlinear ordinary differential equations (ODEs) to model the reaction kinetics of the regulatory network. The ODEs describe the relationship between the reaction rate and the concentrations of the reactants. The change in concentration of a reactant is characterized by a function that takes the regulatory influence (activation or inhibition) of other reactants into account. The general form of nonlinear ODEs is described as follows:

d x i dt = f i x 1 , x 2 , , x m , k i _ j d i x i ,

where x i is the concentration of species i, f i is a nonlinear function, m is the number of species in the system, k i_j is the kinetic parameter with i, j∈{1, 2, … , m} and d i is the degradation rate of species i.

Based on the law of mass action and Hill functions, the nonlinear ODEs including 50 equations and 192 kinetic parameters were built. All equations and their explanations and the initial concentrations of proteins are listed in Additional file 3.

Estimation of the kinetic parameters in the model with the DE algorithm

The parameters in our ODEs can be classified into two categories of regulatory parameters: parameters representing activation or inhibition relations and degradation parameters representing the degradation of individual biomolecular species.

The problem that identifies the kinetic parameters in the model can be converted into the following nonlinear optimization problem, which is the minimization of the error between the simulation values in our model and the experimental data.

min K i = 1 N j = 1 M ω i x i sim t j , K x i exp t j 2 ,

where x i exp(t j ) and x i sim(t j , K) are the experiment data and simulation value of the species i at t j time point respectively. ω i = 1 / max j x i exp t j 2 , K is the parameter set consisting of all the parameters in the model, N is the number of species and M is the number of time points in the biological experiments.

A wide variety of global optimization techniques have been developed to address nonlinear optimization problems [3942]. The DE algorithm, proposed by Storn and Price [42], is a very successful and powerful population-based stochastic search technique for solving global optimization problems and has been widely applied in many scientific and engineering fields [43, 44]. The DE algorithm is described as follows:

Step 1. Initialization: Generate random initial population X G ={ X 1,G , X 2,G ,…, X N,G }, where X i,G ={ X i1 , G , X i2 , G ,…, X iD , G }. N is the population size , G is the generation, X i,G is a rate constant set and D is the amount of the kinetic parameters.

Step 2. Genetic operation:

U ji , G + 1 = V ji , G + 1 , if rand b j CR or j = rnbr i X ji , G + 1 , if rand b j > CR or j rnbr i , j = 1 , 2 , , D ,

where rand(b(j)) is the jth evaluation of a uniform random number in [0,1], CR is the crossover constant in [0,1] and rnbr(i) is a random indexes in {1,…,D}.

  1. 1)

    Mutation: V i,G+1 =X r1,G +F(X r2,G - X r3,G ), r 1 , r 2 , r 3 ∈{1,…,N}\{i}, F∈ [0,2].

  2. 2)

    Crossover: U i,G+1 =(U 1i,G+1 ,…,U Di,G+1 )

Step 3. Selection: If f(Ui,G) ≤ f(Xi,G), then Xi,G + 1 = Ui,G, else Xi,G + 1 = Xi,G, where f is the objection function.

Average relative error

The average relative error (ARE) is defined as follows.

ARE = 1 n i = 1 n Y sim t i Y exp t i Y exp t i ,

where Ysim(t i ) and Yexp(t i ) are the simulation and experiment values of the protein at time point t i and n is the number of samples. In this study, n=6.

Sensitivity analysis

Sensitivity analysis is a useful way to investigate the effects of parameters variations on changes in the model outputs. We formulate the sensitivity s i (t) of parameter P at time t as follows:

s i t = O i t O i t / P t P t | O i P + ΔP , t O i P , t | / O i P , t / ΔP / P ,

where O i (t) is the i-th model output at time t, P is the parameter, ΔP is a small perturbation of P.

Then we define the sensitivity S i of the i-th model output with respect to parameter P blow.

S i = 1 n t = 1 n s i t ,

where n is the number of samples (time points).

Enrichment analysis

We conducted a functional enrichment analysis for the network based on GO Biological Processes (BP) terms and the KEGG pathway with the DAVID bioinformatics database [45]. The enrichment significance was determined by the DAVID tool. The P-values were then corrected for the false discovery rate (FDR). In this study, all the proteins other than IAV in the network are mapped with the DAVID database. For the complex, one member of the complex was mapped. The criterion for statistically significant enrichment was an FDR adjusted p-value less than 0.002.

Results

The optimized IRN based on the experimental data

The initial and simplified IRNs (Figure 3) were constructed using IPA software and the PCA-CMI algorithm (See the Methods), respectively. To further optimize the network according to the experimental data, we first estimated all parameters in our nonlinear ODEs by the DE algorithm (See the Methods). The DE algorithm was carried out ten times, and the best parameter set was obtained, which is listed at Additional file 4: Table S2.

Second, we further deleted some nodes and edges to simplify the IRN according to the following rules. If the optimal value of the kinetic parameter ki_j was zero, we deleted the directed edge, which indicates that biomolecular j does not regulate biomolecular i in the network. Furthermore, if there was no edge to connect with biomolecular i, we deleted the node i in the network. Finally, if the node i has been deleted in the network, the degradation rate d i was set to zero in the numerical simulation. The optimized IRN is shown in Figure 4.

Figure 4
figure 4

The optimized inflammatory regulatory network. The lines ended with arrows and bars denote positive and negative regulatory interactions, respectively.The lines without arrows or bars represent binding interactions. The dashed lines indicated the false positive interactions identified by the proposed method. The red and black lines stand for the regulatory interactions which have been validated by biological experiments and are predicted by the proposed method, respectively. The significant and non-significant regulations are denoted by thick and thin lines, respectively.

Based on the optimal parameters, we performed a numerical simulation for all nodes in the network for comparison with the experimental data. The dynamical processes of 8 key proteins are plotted in Figure 5 and those of other proteins are displayed in Additional file 5. The average relative errors (AREs) of the 98% proteins are less than 0.3, and those of the 2% proteins are within the interval [0.3, 0.7] (Figure 6). These results indicated the fidelity of the obtained IRN. In addition, from the dynamical viewpoint, sensitivity analysis of the ODE models is very important to quantify the reliability of the parameters (regulatory strength between two genes) in the model (See the Methods). The results of the sensitivity analysis showed that the concentrations of the proteins are not sensitive to the perturbation of parameters (Figure 7), which indicating the reliability of the obtained IRN.

Figure 5
figure 5

Comparisons between the numerical simulation results and experimental data of IAV, IL32, IFN-β, TLR3, CCL16, CD40, TNF, NFκB. The blue and red lines denote the experiment and simulation results, respectively. The stars represent experimental data at each time point. The experimental errors are also plotted as short bars at each time point.

Figure 6
figure 6

The distribution of the average relative errors (AREs) for the numerical simulation of the proteins in the optimized network. The y-axis represents the number of proteins whose AREs fall into the corresponding bins.

Figure 7
figure 7

The sensitivity analysis for the parameters in the ODE models. The x-axis is the outputs (proteins) in the model and the y-axis is the calculated sensitivity.

Prediction of regulatory interactions in IRN

Among the regulatory interactions in the optimized network, 45 interactions have been reported in the literature and are represented by red lines in Figure 4. In addition, 37 new regulatory interactions have been predicted from the network and are denoted by black lines in Figure 4. Furthermore, the statistical significance of these regulations between paired proteins was tested using the method presented in the literature [15, 20]. The significant and non-significant regulations were denoted by thick and thin lines in Figure 4, respectively. The number of significant and non-significant regulations was summarized in Table 2. The results demonstrated that most of the predicted regulatory interactions, which are the same as the validated experimental interactions, are statistically significant.

Table 2 The number of significant and non-significant regulations

The presence of false positive interactions is a common problem in inferring a network. One source of false positive interactions is indirect effects (i.e., in a cascade A→ B→ C and A→ C, protein A regulates C and may be mediated by B, so the direct regulatory interaction A→ C may be a false positive interaction). Comparing the optimized IRN with the initial IRN, we have also identified 8 false positive interactions, which are shown by dashed lines in Figure 3. For example, the interactions involving the regulation of IFN-β by TLR3 and IL6 by TNF take effects during IAV infection through other chemical molecules. In our work, we have found that TLR3 regulates IFN-β through NFκB signaling, which is consistent with previous findings. The TLR3-induced NFκB signaling pathway is triggered by the virus, and NFκB regulates expression of the proinflammatory molecule IFN-β in the immune responses [46]. We have also found that TNF regulates IL6 mediated by the activation of CCL2 or CD14. The interactions in the optimized network are further classified, and detailed descriptions are presented in Additional file 6: Table S3.

Identification of the important biological processes and pathways

To gain further insight into the biological interpretation of the optimized IRN during IAV infection, we have performed Biological Process (BP) terms and a KEGG pathway enrichment analysis of all the species other than IAV in the optimized IRN (See the Methods). Functional enrichment analysis of the species was conducted using DAVID [45]. The annotation analysis shows enrichment in BP involved in the defense response, inflammatory response, immune response and regulation of cytokine production. The top 10 enriched BP are listed in Table 3, and all detailed lists of the significantly enriched BP terms (FDR<0.002) are available in Additional file 7. In addition, network ontology analysis (NOA) which annotates biological networks [47], was used to analyze the enriched functions of the optimized IRN. The results of NOA are shown in Additional file 7. The functions of the optimized IAV-induced IRN are enriched in the response to stimulus, immune system process, inflammatory response, response to wounding and positive regulation of cellular processes, which are similar to the results by employing DAVID. The functional annotations of the optimized IRN reveal that the reconstructed IRN functions reasonably well and they reflect the defense response, immune response, response to wounding and regulation of cytokine production are the important processes of IAV-induced inflammatory response.

Table 3 Top 10 significantly enriched GO terms

The KEGG pathway enrichment analysis have identified a total of 8 pathways significantly enriched in this network (FDR<0.002). These pathways are shown in Table 4. Among them, the Toll-like receptors, the RIG-I-like receptors and the NOD-like receptors have been shown to be the main pattern-recognition receptors (PRR) by which the innate immune system recognizes the influenza virus infection [48]. Moreover, the NOD-like receptors play a primary role in host defence against invading pathogens and regulating NFκB signalling, IL1β production, and cell death, indicating that they are crucial to the pathogenesis of a variety of inflammatory human diseases [49]. The cytokine-cytokine receptor interaction and Jak-STAT signalling pathway are also well known antiviral response pathways [50, 51].

Table 4 Significantly enriched KEGG pathways

Three additional identified pathways have not been demonstrated to be associated with IAV infection. The intestinal immune network for IgA production significantly enriched (FDR=1.07e-05). Some researchers have reported that serum IgA is an inflammatory antibody that interacts with FcαRI on effector immune cells and may function as a second line of defence by eliminating pathogens that have breached the mucosal surface [56, 57]. The detection of cytosolic DNA is related to the induction of IFN-α/β and other pro-inflammatory cytokines [5861]. Cytosolic DNA has also been shown to activate the TBK1, IRF3 and the caspase-1-dependent maturation of IL-1β and IL-18 [58, 62]. Allograft rejection is also enriched significantly. Some authors have reported that influenza infections are associated with allograft rejection, but there is no evidence that IAV triggers the acute allograft rejection episodes [6365]. In our results, under the stimulation of IAV, the allograft rejection pathway is significantly enriched. These three pathways lack literature support but may be promising novel pathways and need the experimental validation.

Discussion

The induction of pro-inflammatory cytokines such as COX-2, TNF, IFNs, IL27 and CXCL10 is essential for the host immune response during virus infection, but inappropriately sustained induction causes cytokine-storms, which are associated with a wide variety of infectious diseases [66, 67]. Because of the complexity of the inflammatory response, it is necessary to study the underlying mechanisms of inflammatory response based on a network approach. In this study, we proposed a nonlinear ODE-model based computational method to construct a cell-specific IRN during IAV infection. The main contributions of this study include three aspects. First, we built the large-scaled nonlinear ODE model of the network including 50 equations and 192 kinetic parameters. Most of model-based studies for inferring networks are based on linear ODE models or discrete models [1416], and these linear ODEs are approximated by difference equations or the steady-state assumption, which are easily solved by classical optimization algorithms or software. However, the regulatory interactions in real biological networks are often non-linear. Therefore, the non-linear ODE model can better describe the complicated regulatory networks. The comparison study for the advantage of involving nonlinear items in the model was also performed by using linear ODE model to describe the regulatory network. The AREs in the linear model exhibited significantly higher values than those in the nonlinear model (Additional file 8: Figure S1, P-value<0.001). These results indicated that the non-linear ODE model can better describe the complicated regulatory networks. Second, we combined the DE algorithm with a priori knowledge to refine the nonlinear ODEs and solve the nonlinear optimization problem derived from constructing the network. This nonlinear optimization problem is difficult to solve using classical optimization algorithms because of high nonlinearity and no explicit expression. Although DE algorithm is a published stochastic search technique, it is a repeated process from the model to optimization and then from improved model to optimization. If the model is not correct, the best optimization algorithm is also useless. Our nonlinear ODE model has been repeatedly adjusted. Finally, global errors that reflect the effectiveness of fitting the reconstructed network to experimental data are presented. In most studies based on the linear model systems, they did not provide the errors or only gave the residual errors (local errors) that cannot quantify the real error between the networks and the experimental data.

Because our proposed method integrated gene expression data with a priori knowledge of topological structure from literature and IPA software, it cannot compare with the published purely data-driven methods to evaluate the predictive results. However, these published excellent works may help us to find a more appropriate way to evaluate the approaches that combined the experimental data and a priori knowledge in the future.

An increasing number of researchers have focused on the gene expression profile of host cells infected by influenza virus [6870]. However, most reports involve a single gene or pathway [52, 53, 71]. Few studies have focused on the systematic analysis of the regulation of the cell-signaling cascade by IAV. To understand the global regulatory mechanisms of the inflammatory response during IAV infection, we conducted a pathway enrichment analysis of the optimal IRN with the KEGG database. From our results, a few host cellular signaling pathways stimulated by IAV infection have been identified. Some of these signaling pathways are critical to the innate immune response of the host cell against influenza virus, such as the Toll-like receptor, the RIG-I-like receptor and the NOD-like receptor pathways [48, 54]. The activation of the TLR signaling pathway results in the stimulation of both innate and adaptive immune responses, and TLR agonists may represent an effective and broad-spectrum antiviral strategy to combat influenza viruses [71]. Several virus-encoded components that antagonize RLR signalling interact with and inhibit the IFN-α/β activation pathway using both RNA-dependent and RNA-independent mechanisms [55, 72].

Among the three novel pathways identified in our study, the functions of IgA have been studied [56, 57]. Secretory immunoglobulin A (SIgA) is the major antibody isotype present in mucosal secretions and has many functional attributes, both direct and indirect, serving to prevent infective agents such as bacteria and viruses from breaching the mucosal barrier [42]. Many DNA-sensors have been reported, such as IFI16, RNA Polymerase III, DAI, AIM2, NLRP3, LRRFIP1 and DDX9/36. They play an important role in IFN-α/β and cytokine production [54, 58, 73]. For example, IFI16 can induce the inflammasome in response to Kaposi’s sarcoma-associated herpesvirus infection and act as a mediator of the anti-inflammatory actions of type I IFNs [7376]. AIM2 triggers the assembly of the inflammasome, culminating in caspase-1 activation, IL-1β maturation and pyroptotic cell death [77]. LRRFIP1 has been shown to contribute to the production of IFN-β induced by vesicular stomatitis virus (VSV) and Listeria monocytogenes in macrophages [78]. This evidence indicates that DNA sensors play an important role in virus infection. However, these results need further biological experimental verification.

It should be noted that the protein activity profiles are substituted with the corresponding gene expression levels in the computation in the study because the protein activity profile data have not been easily obtained thus far. Therefore, there may be some discrepancy when modelling the network. In addition, the network we constructed does not involve RNA components, such as target mRNAs, micro-RNAs (mi-RNAs) or other non-coding RNAs, which may also modulate signals at many steps. Recent studies have provided evidence of a potential role for long non-coding RNAs (lncRNAs) in regulating inflammatory gene expression [79, 80]. Emerging evidence shows that mi-RNAs have been clearly implicated in the regulation of inflammatory responses [8185]. To better understand the molecular mechanisms of the inflammatory response during IAV infection, it requires the challenging process of constructing inflammatory regulatory networks by integrating different types of data, such as gene expression data, protein activity profiles, mi-RNAs expression profiles and Chip-seq data.

Conclusions

A cell-specific IRN in IAV infection has been constructed based on the proposed method. Furthermore, 37 new regulatory interactions were predicted and 8 false positive interactions of IRN and 3 novel pathways were identified in the study. These new findings provide insight into our understanding of the mechanism of inflammatory response in IAV infection. Understanding the pivotal role of signaling pathways during IAV infection may provide new insight into therapeutic strategies for the control of virus infection and inflammatory response. Our findings also have significant implications on the development of biomarkers for infectious disease.