Background

The advancement of gene profiling techniques has expanded the genomics research from a single gene analysis to the analysis of genome-wide gene expression data [1, 2]. The result from genome-wide gene expression data analysis is typically further processed for pathway analysis to investigate the association between a set of genes or proteins and phenotypes such as metabolism [3], gene regulation [4] or signal transduction [5]. Pathway analysis produces the global landscape of cellular process [6], which cannot be derived from a list of differentially expressed genes (DEGs). Especially, understanding the dynamics of pathways helps identify biological processes triggered by a specific condition [7, 8] or elucidate a different mechanism among multiple phenotypes [9, 10].

A lot of efforts have been made to define sets of genes that perform key roles for common mechanisms. As a result, a number of databases have been developed to curate sets of genes as pathways [11]. KEGG is the most widely used pathway database and it also provides graphical representations for molecular interactions in pathway [12]. REACTOME [13] and NCI-PID [14] are also well curated pathway databases used for many research projects. Pathway databases facilitate gene set analysis and help researchers to understand biological process.

With gene expression profiling techniques and well curated pathway databases, gene expression data is now routinely analyzed in terms of biological pathways. Over the years, a number of tools for pathway analysis have been developed and they can be categorized as i) over-representation analysis (ORA), ii) functional class scoring (FCS), and iii) pathway topology (PT) based approach [15]. ORA methods select a gene set (e.g. DEGs) from expression data and statistically evaluate the proportion of the gene set in terms of biological pathways. Fisher’s exact test or Chi-square are widely used to perform the ORA based analysis tasks [16]. FCS methods assign gene-level statistics to each gene, and aggregate them into the pathway-level statistics. Gene set enrichment analysis (GSEA) is a representative FCS method that determines whether a set of genes that are predicted to share a common biological function are randomly distributed or over-represented either at the top or bottom of the ranked list [17]. PT-based methods use the topology of a pathway where genes are nodes and their interactions are edges. For instance, CliPPER selects significant pathways based on the network represented as the mean and covariance matrix and determines fraction of signaling paths that are correlated with phenotypes [18].

Recently, time-series data has been considered as important key resources to understand the dynamics of biological mechanism over time and the number of datasets or research projects producing time-series gene expression data has increased dramatically [19]. Thus, several pathway analysis methods for time-series gene expression data have also been developed recently. For example, Time-series RNA-seq Analysis Package (TRAP) analyzes time-series gene expression data and identifies significant pathways with regard to the propagation difference of gene expression between two different conditions [20].

Motivation

Pathway analysis from gene expression data using these tools identifies which biological pathways are important to understand the context of data or research being investigated (e.g. phenotype). However, there is no guarantee that all pathways selected by pathway analysis are relevant to the context [21]. One of the major reasons for this inconsistency is existence of overlapped genes among multiple pathways [22, 23]. Table 1 shows how many genes belong to multiple pathways in KEGG pathway database. Among 6,972 genes participating in 295 homo sapiens pathways of KEGG, more than half of the genes belong to more than two pathways. As an example, a single gene, MAPK1, is involved in 85 pathways. These overlapped genes among multiple pathways make some pathways significant regardless of the relation with the context, concurrently. Thus, the result of pathway analysis can produce pathways that are not related to the context being investigated.

Table 1 The number of involved pathways for each gene in KEGG pathway database

One effective way to verify whether a significant pathway is actually related to the context or not is to search the literature information. If some literatures support specific relationship between the pathway and the context of data, we can be more confident to choose the pathway as one significantly expressed and truly related to the context. Thus, our goal in this study is to come up with a computational framework to combine pathway analysis of gene expression data and the literature information to select pathways relevant to the the context of the experimental condition, typically control vs. treated. To describe our research clearly throughout the paper, we introduce two concepts as below.

  • Significance: It is to measure the correlation between a pathway and the context from gene expression data[24]. Currently existing pathway analysis tools evaluate which pathways are significant in characterizing phenotype, using it in terms of scores like p-values. In this paper, this concept is defined as significance and it will be used throughout the manuscript.

  • Relevance: It means how a pathway is truly associated to the context [25] and one practical method to measure it is to exploit literature information reporting the observations or evidences of association between the pathway and the context. It is stronger than significance since significance is simply to measure how much correlation exists between the pathway and the context through expression values, while relevance requires direct observations or evidences that the pathway is actually related to the context. In this paper, this concept is defined as relevance and it will be used throughout the manuscript.

The goal of this study is to come up with a computational method to combine both significance and relevance. In particular, the integration of these two concepts is to combine analysis of gene expression data (significance) and the literature information based on the contextual information provided by the user (relevance). Figure 1 shows the overview of the proposed method. The significance of pathways is calculated using existing pathway-based gene expression data analysis tools. The relevance of pathways is obtained from the literature search upon keywords that are provided by the user to specify the context of the experiment. Scores for significance and relevance are combined into a single score by summing up the two scores as a weighted sum.

Fig. 1
figure 1

The overview of our proposed method. The significance of pathways are obtained from pathway analysis using gene expression data. The relevance of pathways are retrieved from literature information. Then, significance and relevance are integrated into a single score in a weighted sum. The integrated score is used to prioritize pathways, considering significance and relevance, simultaneously

Methods

For the implementation of the proposed concept, we used an existing pathway analysis method and a literature search tool. TRAP [20] is selected for pathway analysis. TRAP uses method that combines the ORA and PT-based approaches to find significant pathways from KEGG pathway database and it is also designed for time-series gene expression data. For the literature search, Biomedical Entity Search Tool (BEST) [26] is used. BEST uses the concept of Maximal Coherent Semantic Unit for indexing keywords to associate the keyword and literatures efficiently. Using BEST, users can specify the contextual information by specifying a set of keywords for the biological experiments that generated data for analysis. BEST returns biological entities with entity scores as a result of literature search. The entity score is computed by considering various factors such as the publication date, the number of citations and impact factors of journals.

The integration of significance and relevance can be easily done with these two tools. Integrating TRAP and BEST, we implemented ContextTRAP as in Fig. 2. Time-series gene expression data and keywords representing a context of data are given as inputs. TRAP analyzes time-series gene expression data to obtain the significance of pathways. Using keywords, BEST is used to obtain the relevance between a pathway and the context as an entity score.

Fig. 2
figure 2

The workflow of ContextTRAP. Time-series gene expression data is given to TRAP and keywords representing the context are given to BEST as input. ContextTRAP incorporates TRAP and BEST by integrating the analysis results from both tools. The entity score derived from BEST with keywords is converted to Context Score (CS), which measures the relevance of pathway. For that, the concept of posterior probability and Bayes’ rule are used. The p-value, the result of TRAP, is transformed and normalized to Impact Score (IS) measuring the significance of pathway. Then, Discovery rate is automatically determined as a weight for integration of two scores, depending on the results of TRAP and BEST. Finally, the integrated score is used to prioritize pathways in terms of both significance and relevance

To quantify and integrate significance and relevance, we introduce two score terms. Using the concept of posterior probability, Context Score (CS) measures the relevance of pathway from the entity score of BEST. Meanwhile, Impact Score (IS) measures the significance of pathway from p-value result of TRAP analysis for time-series gene expression data. Then, CS and IS are integrated via an automatically assigned weight called Discovery rate.

In this section, we introduce how to transform the result of BEST to CS, how to transform the result of TRAP to IS, and how to calculate Discovery rate that will be used as weight to integrate of CS and IS. In addition, we introduce the method of Pathway Set Enrichment Analysis (PSEA) to evaluate the result of ContextTRAP in Section “Accuracy of discovery rate estimation”.

Context score

The set of pathways to be analyzed is denoted as π={π 1,…,π n }. π has n pathways and i-th pathway is denoted as π i . A keyword is needed to be specified as an input for BEST which is the contextual information related to the context like the experimental condition or the phenotype which the research wants to investigate (e.g. disease, symptom, or gene). This keyword is denoted as k.

Context Score ( CS(π i ,k)) measures how many research or articles reported some association of the pathway π i and the keyword k. C S(π i ,k) can be computed as a form of posterior probability like Eq. (1). By entering k into BEST as a search word, a list of entities related to k is retrieved with an entity score of each entity. Then, C S(π i ,k) is calculated easily from the entity score of π i by finding π i from the retrieved list of entities.

$$ CS(\pi_{i}, k) = P(\pi_{i}|k) $$
(1)

However, some pathways are not included as entities in BEST, probably because the current literatures, though increasing rapidly, are yet to be complete enough to cover all biological pathways. Thus, it is difficult to get the entity score for whole pathways by Eq. (1). To handle this problem, we converted P(π i |k) to Eq. (2) using the Bayes’ rule. Then, we compute likelihood P(k|π i ) instead of P(π i |k), meaning that retrieves the entity score of k entering π i as a search word into BEST. Considering this concept, if user select k from the entity list of BEST in advance, C S(π i ,k) of any pathway can be obtained from the entity score of k.

$$ P\left(\pi_{i}|k\right) = \frac{P\left(\pi_{i}\right)P\left(k|\pi_{i}\right)}{P(k)} =\frac{P\left(\pi_{i}\right)P\left(k|\pi_{i}\right)}{\sum_{j=1}^{n}{P\left(\pi_{j}\right)P\left(k|\pi_{j}\right)}} $$
(2)

Given a pathway π i that is used as a search word for BEST, the entity score of k is transformed to P(k|π i ) as Eq. (3). It is a logarithm of (s i +1) to base b. s i is the entity score of k derived from using π i as search word and b is s max +2, where s max is the maximum of all s i . Logarithm is a monotonically increasing function, so a higher entity score is mapped to a higher probability. Because the base of logarithm is s max +2, the highest entity score is mapped to a probability approximated to 1. Thus, Eq. (3) converts entity scores of k for each of pathways ranged to [0,s max ] to a probability with a range [0,1). In addition, this function is concave down, which means the rate of increase is larger for smaller entity scores but it decreases and converges gradually. Thus, use of the logarithm function effectively makes densely distributed small entity scores widely scattered while big entity scores remain distinguishable from the small entity scores.

$$ P\left(k|\pi_{i}\right) = \log_{b}{\left(s_{i} + 1\right)} $$
(3)
$$ P\left(\pi_{i}\right) = \log_{b}{\left(t_{i} + 1\right)} $$
(4)

Prior probability P(π i ) is calculated using Eq. (4), which is similar to Eq. (3). P(π i ) represents the prevalence of π i in all possible keywords, and this prevalence can be acquired by t i derived from summing up all scores of entities related to π i in BEST. Because obtaining scores of all possible entities is a time-consuming work and scores of low-ranked entities have very small values, t i is approximated by entering π i into BEST and adding top ten entity scores in three categories (gene, disease, and pathway), respectively. Then, b is calculated as t max +2 to make P(π i ) range from 0 to 1, where t max is the maximum of all t i .

In some cases, multiple keywords are needed to cover the context of data. Then, final C S(π i ,k) of multiple keywords is obtained by calculating the average of C S(π i ,k) derived by each keyword.

Impact score

Impact Score ( IS(π i )) measures how significant pathway π i is as a result of gene expression data analysis. To handle time-series data, ContextTRAP uses TRAP for analyzing gene expression data. Before deriving I S(π i ), we define I S (π i ) like Eq. (5), using the p-value of each pathway from the result of TRAP. Because the p-values of significant pathways are very small near to zero, distribution of p-values has to be scattered and expanded. Thus, a negative logarithm to base 10 is applied for negating the density of p-values.

$$ IS^{*}(\pi_{i}) = -\log_{10}{pvalue_{i}} $$
(5)

After applying the logarithm-based transformation, a min-max normalization adjusts the maximum (or minimum) of I S (π i ) to the maximum (or minimum) of C S(π i ,k) like Eq. (6). Then, I S(π i ) can have an equal influence as C S(π i ,k) when integrated into a single combined score.

$$ {}\begin{aligned} IS(\pi_{i})\! = \left(IS^{*}(\pi_{i})-\min{IS^{*}}\right)\frac{\max{CS}-\min{CS}}{\max{IS^{*}}-\min{IS^{*}}}+\min{CS} \end{aligned} $$
(6)

Discovery rate

IS(π i ) and CS(π i ,k) are normalized in the same scale so that the integration of the two scores can be easily done. Two scores are integrated by a weighted sum as in Eq. (8), so it is important to select a proper weight for integration reflecting the importance of IS(π i ) and C S(π i ,k). Discovery rate (w θ ) is a dynamically assigned weight as in Eq. (7) to reflect the importance of the significance and the relevance. It is the proportion of pathways whose p-value is bigger than a specific threshold θ and C S(π i ,k) is bigger than zero. It represents the ratio of pathways that are not selected as significant by TRAP but discovered as meaningful in BEST. In other words, Discovery rate shows how many pathways are missed by the gene expression analysis but have been investigated and reported in the literature v.s. all the pathways. n is the total number of pathways and θ means a threshold of p-value for selecting insignificant pathways from TRAP and 0.05 is used in this paper, since p-value of 0.05 is a widely used cutoff value for indicating statistical significance. Discovery rate reflects some characteristics about data. 1) If the user-defined context is well supported by the literature and BEST can cover many pathways related to the context, Discovery rate will be bigger to increase the effect of C S(π i ,k) and vice versa. 2) Although BEST finds many pathways with the context, if TRAP covers most of the pathways detected by BEST, the importance of C S(π i ,k) decreases.

$$ w_{\theta} = \frac{\sum_{i=1}^{n}{I(pvalue_{i} > \theta \ and \ CS(\pi_{i}, k) > 0)}}{n} $$
(7)

The final score of π i is derived by combining C S(π i ,k) and I S(π i ) with Discovery rate as in Eq. (8). Using this score, significance of pathway is re-estimated.

$$ Score(\pi_{i}, k) = w_{\theta}CS(\pi_{i}, k) + (1-w_{\theta})IS(\pi_{i}) $$
(8)

Pathway set enrichment analysis

To evaluate the pathway list determined by ContextTRAP in Section “Accuracy of discovery rate estimation”, we propose Pathway Set Enrichment Analysis (PSEA), a modified version of Gene Set Enrichment Analysis (GSEA)[17] at the pathway level. GSEA is a method to determine whether a set of genes is significant or not, while PSEA measures whether a ranked list of pathways is significant or not. From a pathway list ranked by Eq. (8), PSEA calculates Rank Score (RS(i)) at rank i like Eq. (9). In the ranked list, R S(i) is increased or decreased from R S(i−1), depending on whether π (i) is relevant pathway or not, where π (i) is i-th pathway in the ranked list. In this paper, relevant pathways are defined as π by DAVID [27, 28] analysis using a specific gene set that was validated or reported to be related to the context in original paper, while the rest of the pathways are defined as π . Firstly, R S(0) is initialized to 0. Then, if π (i) is involved in π , R S(i) is increased by a proportion of r i which means the impact of rank i, where r i is ni+1. On the other hand, if π (i) is member of π , R S(i) is decreased by a reciprocal of the number of π . Enrichment Score (ES) is the maximum of RS, which measures how significant the pathway list is. The higher the ranks of overall π , the higher the ES score is.

$$ RS(i)= \left\{ \begin{array}{ll} 0, & i = 0 \\ RS(i-1) + \frac{r_{i}}{\sum{}^{}{r}},& \pi_{(i)} \in \pi^{*} \\ RS(i-1) - \frac{1}{\left|\pi^{-}\right|}, & \pi_{(i)} \in \pi^{-} \end{array}\right. $$
(9)
$$ ES = \max_{i}{RS(i)} $$
(10)

Results and discussion

In this section, we evaluated the performance of ContextTRAP by analyzing two public datasets. Two datasets have time-series gene expression data measured in control vs. treatment experiments. Using these datasets, we evaluated the performance of ContextTRAP in various ways. Firstly, we tested whether the literature information truly supports the relevance, using various keywords, each of which has different strength of representing the context of data. Secondly, Pathway Set Enrichment Analysis (PSEA) introduced in Section “Pathway set enrichment analysis” was used to verify whether Discovery rate estimates a proper weight or not for integration. Next, Kolmogorov Smirnov (KS) test was used to compare the performances of ContextTRAP and the original TRAP in terms of prioritization. Finally, we compared ContextTRAP with original TRAP and also with three existing pathway analysis methods, one from each of major pathway analysis categories of ORA, FCS and PT-based methods, in terms of F1 score. Additionally, we investigated the biological importance of the prioritized pathways in ContextTRAP.

Data processing

To evaluate the performance of ContextTRAP, two public datasets are selected. i) H5N1 identified signaling networks affected by highly pathogenic avian influenza H5N1 infection [29]. ii) Th17 identified regulatory networks controlling the T H 17 cell differentiation triggered by TGF- β1 and IL-6 [30]. Raw data of two datasets have been deposited in the Gene Expression Omnibus(GEO) database under access links http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE28166 for H5N1 and http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE43955 for Th17.

Firstly, keywords representing the context of data are necessary to obtain CS. It should be one of the entities of BEST, thus we selected the simplest and the most relevant keywords from the entity list in BEST. In Table 2, keywords used for each dataset are shown. Influenzas is used as a keyword for H5N1, while TGFB1 and IL6 are used as keywords for Th17. In the web version of ContextTRAP, user can search and select keywords from the entities of BEST to specify the experimental condition of the input dataset.

Table 2 Datasets used to evaluate contextTRAP

To define the pathways which are relevant to the context of each dataset, DAVID [27, 28] analysis was performed using gene sets validated and reported in experiments that generated each dataset. The gene sets and the selection criteria of the gene sets are provided in additional file [see Additional file 1]. With these gene sets, 15 pathways from H5N1 and 29 pathways from Th17 are selected as relevant pathways of each dataset. In this paper, these relevant pathways are represented as π and the rests are represented as π . Then, the purpose of our research is prioritizing π from pathway analysis result. The list of π for each dataset is provided in additional file [see Additional file 2].

For running TRAP algorithm, we selected time-lag factor, parameter to adjust the ratio of influence from the previous time point, as 1. Also, cutoff value, threshold to find DEGs by fold-change as logarithm, was selected as 1, according to the default values.

Using the result of BEST and TRAP, Discovery rate was calculated automatically: 0.30 for H5N1 and 0.21 for Th17. The result of data processing is shown in Table 2. H5N1 is time-series gene expression data of Homo sapiens and has 295 KEGG pathways in total referred as |π|. Th17 is for Mus musculus and has 291 KEGG pathways. The |π | means the number of π which are context-relevant pathways retrieved from DAVID.

The effect of relevance between keyword and the context of data

To show whether the literature information can represent the relevance, we tested whether the distribution of π in pathway list returned from BEST is different in accordance with the degree of association between keyword and context of data. Figure 3 shows pathway list sorted by the result returned from BEST with various keywords for each dataset.

Fig. 3
figure 3

The comparative result of literature search from BEST with various keywords. a is the result of H5N1. Alcohol dependence, Infectious diseases, Influenzas are selected as keywords depending on the relevance to the context. Boxplot of each color shows the distribution of rank of relevant pathways (π ) in pathway list sorted by BEST score returned with each keyword and color marking in the bottom gray boxes is the position of π in pathway list. It shows relevant pathways rank higher when more relevant keyword is given to BEST. b is the result of Th17. Alcohol dependence, TGFB1, IL6 and combination of TGFB1 and IL6 are used as keywords. It shows that combination of multiple relevant keywords can make better performance than using a single keyword

Figure 3(a) is the result of H5N1 using Alcohol dependence, Infectious diseases and Influenzas as keywords. Influenzas is the best keyword well representing the context of H5N1 and Infectious diseases that is more comprehensive term than Influenzas was selected as secondly relevant keyword. Alcohol dependence was selected to show the result of keyword having little relevance with the context. In the bottom of the figure, each color marker shows the position of π in 295 pathways ordered by BEST score derived from each keyword. Boxplot above shows the distribution of rank of π in pathway list. In Influenzas which is most relevant to H5N1, π are mainly positioned at the front of pathway list. It means Influenzas well represents the context of H5N1. As a secondly relevant keyword, Infectious diseases shows similar result with Influenzas, but π are mainly positioned at lower rank than Influenzas. In Alcohol dependence, most π are distributed in backward of pathway list. It suggests that the result of literature search with a keyword that well reflects the context of data makes reasonable pathway scores representing the relevance between pathway and the context.

Figure 3 b shows the result of Th17 using Alcohol dependence, TGFB1, IL6 and combination of TGFB1 and IL6 as keywords. Like Fig. 3 a, 291 pathways ordered by the result of BEST with various keywords are represented in x-axis and π are marked in color. The result of Alcohol dependence which is a irrelevant keyword shows that π are uniformly distributed on the pathway list without any tendency. Two mainly relevant keywords, TGFB1 and IL6, show good performance where π are focused on top of list. When two relevant keywords are used together like a combination of IL6 and TGFB1, however, the result is improved than the result of single keyword. It seems that combined keywords mutually supplement search results and make synergy from each keyword.

Accuracy of discovery rate estimation

To show how well Discovery rate, the weight in Eq. (8), is set automatically, we used Pathway Set Enrichment Analysis (PSEA) introduced in Section “Pathway set enrichment analysis”. The weights increased by 0.2 from 0.0 to 1.0 were compared with Discovery rate of each dataset. A weight of 0.0 indicates that the TRAP analysis result is solely used and a weight of 1.0 indicates that the BEST analysis result is solely used. Figure 4 shows the results of PSEA for (a) H5N1 and (b) Th17. In Fig. 4 ab, the positions of π in the sorted pathway list determined by ContextTRAP are indicated in x-axis at 7 different weights, π in gray and π as bars in color other than gray. The line graph shows a change of R S(i) through x-axis from the first rank to the last rank.

Fig. 4
figure 4

The result of pathway set enrichment analysis. ab show PSEA results of H5N1 and Th17, respectively. The pathway list sorted by Score(π i ,k) with various weights, from 0.0 to 1.0 increased by 0.2 and Discovery rate (0.30 and 0.21), is present in x-axis. The line graph shows Rank Score (R S(i)) of each rank i, changed through pathway list from the first rank to the last rank. Color marking in the bottom gray boxes means position of π in pathway list. cd show Enrichment Score(ES) of each dataset, which is the maximum RS

With the weight of 0.0 in Fig. 4 ab, which shows the result of the original TRAP, the π generally tend to be focused on the front of pathway list, which indicates that TRAP performed well. However, considering the literature information, ContextTRAP prioritized π more from the result of TRAP. Figure 4 cd shows ES which is the maximum RS at weights of 0.0, 0.2, 0.4, 0.6, 0.8, 1.0 and also Discovery rate. With Discovery rate, ContextTRAP improved ES compared to original TRAP, from 0.761 to 0.787 for H5N1, and from 0.726 to 0.763 for Th17. It is the highest of the results from each weight. This experiment shows that our strategy of combining relevance and significance of pathways is quite good without requiring the user to set the weight value. Note that our system determines Discovery rate automatically and it reduces the bias which can be caused by a fixed weight.

How much improvement is achieved in detecting relevant pathways in comparison with the original version of TRAP

We measured how much improvement was achieved by comparing performance of ContextTRAP that incorporated BEST and the original version of TRAP that does not utilize literature information. Using Kolmogorov Smirnov (KS) test, we can compare two distributions effectively. KS test measures the D statistic that represents the maximum distance between two empirical distribution functions of two samples. Applying the KS test, we compared distributions of π and π in sorted pathway list returned from ContextTRAP and original TRAP. If π are positioned at the head of list and π are positioned at the tail of list, D between them becomes higher. If π and π are mixed in list, on the other hand, D will be a lower value.

Figure 5 shows the results of KS test of ContextTRAP and the original TRAP for each dataset. Figure 5 a is the result of H5N1 and Fig. 5 b represents the result of Th17. The pathway list is sorted by score of each analysis and is present in x-axis of each plot. Red markers indicate the position of π in the pathway list. In each plot, points in red color represent the empirical distribution function of π and blue points indicate that of π in the pathway list returned from pathway analysis.

Fig. 5
figure 5

The result of KS test for the distribution of π and π in pathway list returned from TRAP and ContextTRAP. a is the result from H5N1 and b is the result from Th17. In the plot, red points indicate empirical distribution function of π and blue points indicate that of π . Gray box along the x-axis represents the pathway list retrieved from each method. Pathways are sorted by p-value (TRAP) and S c o r e(π i ,k) (ContextTRAP). Red markers represent the positions of π in pathway list. D statistic from KS test represents distance between two empirical distribution functions and it shows improvement in ContextTRAP incorporating the literature information

For two datasets, plots show that π are distributed at the head of pathway list and empirical distribution function of π increases more rapidly in ContextTRAP compared with original TRAP. In addition, D is higher in ContextTRAP than original TRAP. It means π are prioritized well in ContextTRAP by using literature information.

Comparison with other pathway analysis methods

We compared the performances of ContextTRAP with three other pathway analysis methods included in the graphite web server [31], also with the original TRAP. Graphite web is a web tool for pathway analysis using gene expression data, providing various analysis methods. For the comparison, three pathway analysis methods were selected, one for each of three categories of pathway analysis tools—Fisher’s exact test, GSEA [17] and CliPPER [18] represent ORA methods, FCS methods and PT-based methods, respectively. For a quantitative comparison, we calculated F1 score of a pathway list determined by each of the pathway analysis tools. F1 score is a harmonic mean of precision and recall and it is widely used to evaluate the performance of binary classification tests. In this analysis, we consider π set as a positive condition set and π set as a negative condition set in terms of true condition set. Then, pathways are predicted as positive or negative, depending on whether a pathway is classified as significant or insignificant by each of pathway analysis tools. Using the result of predicted condition set and the pre-defined true condition set, F1 score is calculated.

To split the pathway list of ContextTRAP into significant and insignificant, p-value of each pathway is calculated by permutation. P-value is derived from a distribution of permuted scores by generating all possible combinations of CS and IS of all pathways. Then, pathways having p-value under 0.05 are selected as significant for ContextTRAP. For other pathway analysis tools, we selected pathways having p-value below 0.05 as significant.

Figure 6 shows F1 scores for five methods—ContextTRAP, original TRAP, Fisher’s exact test, GSEA and CliPPER. Figure 6 a represents the result of H5N1 and Fig. 6 b represents the result of Th17. For two datasets, original TRAP shows higher F1 score than other three methods—Fisher’s test, GSEA and CliPPER. It is probably because other methods except TRAP do not consider the effect of time factor in a proper manner, handling the time-series samples as multiple replicates. However, ContextTRAP shows the highest F1 score, even higher than the score of original TRAP. It indicates that literature search realizes the prioritization of context-relevant pathways and improves the quality of significant pathway set from original TRAP.

Fig. 6
figure 6

The F1 score of five pathway analysis methods for each dataset. a is the result of H5N1 and b is the result of Th17. Fisher’s exact test, GSEA and CliPPER from graphite web were compared with ContextTRAP and also original TRAP was included in the comparison. ContextTRAP shows higher F1 score than other three methods and even than original TRAP

Biological perspective

Table 3 shows part of pathways whose ranks went up as a result of incorporating the contextual information, compared to the original version of TRAP. An additional file shows the whole list of pathway rankings [see Additional file 3].

Table 3 The list of pathways that rank higher in ContextTRAP than in TRAP

In H5N1, the rank of the pathways that are related to the immune system and affected by infection of H5N1 influenza moved up significantly. Invading to host, H5N1 viruses activate PI3K-Akt signaling pathway to increase their replication efficiency [32]. The NS1 protein of H5N1 virus binds to p85 β, a regulatory subunit consisting in PI3K, and induces activation of Akt [33]. It leads to promoting cell growth, cell cycle or other cellular processes that support replication of H5N1 [34]. In addition, nucleoprotein in H5N1 induces apoptosis in host cells for their efficiency of replication, by interacting with clusterin, antiapoptotic protein of host [35]. The hemagglutinin of H5N1 activates JAK-STAT signaling pathway associated with transcriptional activation of chemokines/cytokines genes and incurs a destructive innate immune response [36].

In Th17, pathways related to cytokines that induce differentiation of T H 17 cell moved upwards. Differentiation of T H 17 cell is induced by IL-6 and TGF- β1 together [37]. In addition, IL-23, IL-1 β and IL-21 play a role in amplifying the differentiation IL-6 and TGF- β1 [38]. Mitogen-activated protein kinases (MAPK) including ERK, JNK and p38 are related to mediation of the intracellular responses to TGF- β [39, 40]. Toll-like receptor (TLR) is also involved in differentiation of T H 17 cell [41]. TLR2 has been implicated in promoting T H 17 cell differentiation and proliferation [42].

Conclusions

Many gene expression analysis methods are available for identifying significant pathways from transcriptome data. However, these methods are often misled by many genes that are involved in a number of pathways. To address the challenge, we developed a new computational framework to combine analysis of gene expression data and the literature information based on the contextual information provided by the user as keyword. We defined two scores, Impact Score and Context Score, to measure significance from the result of pathway analysis and to specify relevance from the result of literature search, respectively.

Our novel framework was implemented as ContextTRAP by utilizing two existing tools, TRAP and BEST. We evaluated the performance of ContextTRAP with two data sets, H5N1 and Th17, in comparison with the state of the art pathway analysis tools in each of the three categories of pathway analysis tools: Fisher’s exact test representing ORA methods, GSEA [17] representing FCS methods, and CliPPER [18] representing PT-based methods. In terms of F1 scores, ContextTRAP achieved better performances than existing methods and than the original TRAP. One notable feature is that ContextTRAP automatically combines Impact Score and Context Score into a single score by setting Discovery rate dynamically. In extensive experiments with various weights, Discovery rate showed the maximum performance, which demonstrates the ability of ContextTRAP to combine analysis of transcriptome data and the literature information dynamically, depending on the level of literature knowledge related to the experiment.

We believe that ContextTRAP will be a very useful resource for the pathway based analysis of gene expression data for the time-series, since the user can specify the context of the biological experiment in a set of keywords.