A MATLAB tool for pathway enrichment using a topology-based pathway regulation score

Ibrahim, Maysson; Jassim, Sabah; Cawthorne, Michael Anthony; Langlands, Kenneth

doi:10.1186/s12859-014-0358-2

A MATLAB tool for pathway enrichment using a topology-based pathway regulation score

Software
Open access
Published: 04 November 2014

Volume 15, article number 358, (2014)
Cite this article

Download PDF

You have full access to this open access article

BMC Bioinformatics Aims and scope Submit manuscript

A MATLAB tool for pathway enrichment using a topology-based pathway regulation score

Download PDF

Maysson Ibrahim^1,2,
Sabah Jassim¹,
Michael Anthony Cawthorne² &
…
Kenneth Langlands²

5605 Accesses
8 Citations
2 Altmetric
Explore all metrics

Abstract

Background

Handling the vast amount of gene expression data generated by genome-wide transcriptional profiling techniques is a challenging task, demanding an informed combination of pre-processing, filtering and analysis methods if meaningful biological conclusions are to be drawn. For example, a range of traditional statistical and computational pathway analysis approaches have been used to identify over-represented processes in microarray data derived from various disease states. However, most of these approaches tend not to exploit the full spectrum of gene expression data, or the various relationships and dependencies. Previously, we described a pathway enrichment analysis tool created in MATLAB that yields a Pathway Regulation Score (PRS) by considering signalling pathway topology, and the overrepresentation and magnitude of differentially-expressed genes (J Comput Biol 19:563-573, 2012). Herein, we extended this approach to include metabolic pathways, and described the use of a graphical user interface (GUI).

Results

Using input from a variety of microarray platforms and species, users are able to calculate PRS scores, along with a corresponding z-score for comparison. Further pathway significance assessment may be performed to increase confidence in the pathways obtained, and users can view Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway diagrams marked-up to highlight impacted genes.

Conclusions

The PRS tool provides a filter in the isolation of biologically-relevant insights from complex transcriptomic data.

STAGEs: A web-based tool that integrates data visualization and pathway enrichment analysis for gene expression studies

Article Open access 02 May 2023

Pathway Enrichment Analysis of Microarray Data

ToPASeq: an R package for topology-based pathway analysis of microarray and RNA-Seq data

Article Open access 29 October 2015

Background

Increasingly, high-throughput transcriptional profiling techniques (microarrays or, increasingly, RNAseq) inform modern life-science research. Such techniques provide a molecular “camera” taking genome-wide “snap-shots” of genetic activity. However, the effective analysis of microarray data presents a number of challenges, in particular handling the large number of genes that are studied simultaneously.

Analysing gene expression in the context of curated knowledge, or “knowledge base-driven pathway analysis”, is critical as this guides the reduction in search space from many thousands of genes to an subset of biological processes, which are much more tractable to human interpretation [1]. According to Khatri et al [2], pathway enrichment approaches can be divided into three generations:

i.
Over-representation Analysis (ORA): This scores a pathway by considering the proportion of differentially-expressed genes (DEGs) observed in each pathway relative to the proportion of all microarray DEGs. This is used by several pathway analysis tools, including GenMAPP [3], GoMiner [4], Onto-Express [5] and FatiGo [6].
ii.
Functional Class Scoring (FCS): FCS gives a score to each gene in a pathway based on its expression, from which a pathway-score is calculated based on the scores of all the genes in the pathway. A number of FCS methods have been implemented through standalone tools such as GSEA [7], SigPathway [8], and SAFE [9], or web tools such as T-profiler [10], Gazer [11] and GeneTrail [12].
iii.
Pathway Topology (PT)-based approaches: These approaches exploit the topology of pathways by giving weights to pre-defined connections between genes, which inform pathway scoring. Several topology-based approaches have been described in the literature over the past few years. According to Mitrea et al [13], PT-based approaches differ in the way they translate pathway topology information into a pathway score. Some methods use only the topology data of differentially-expressed genes (DEGs) in the enrichment score (for example MetaCore [14] and EnrichNet [15]), whereas others (including SPIA [16] and GANPA [17]) use expression data of DEGs along with the topology data. Alternatively, some methods use expression data derived from all microarray genes, whether they change between conditions or not, for example PathOlogist [18], DEGraph [19], and ACST [20]. Importantly, some PT-based tools use only signalling pathway descriptions, such as Pathway-Express [21], NetGSA [22], ScorePAGE [23], TAPPA [24] MetPA [25], and Clipper [26].

Previously, we proposed a new pathway enrichment method, in which both pathway topology and the magnitude of gene expression changes informed the creation of a Pathway Regulation Score (PRS) [27]. Specifically, by combining fold-change data for those transcripts exceeding a significance threshold, and by taking into account the potential of altered gene expression to impact upon downstream transcription, we identified those pathways most relevant to the pathophysiological process under investigation. Our approach addressed a number of issues that potentially compromise enrichment methods. We took steps to mitigate the influence of errors in ID mapping, and to reduce the bias introduced by highly-redundant pathways (i.e. multiple instances of the same gene). Topology methods also have to handle loops effectively, so we used a search algorithm derived from graph theory to resolve this problem. We also felt that arbitrarily dividing processes into either up- or down-regulated was artificial as changes in gene expression are likely to be distributed throughout pathways, thus ours was an overall impact assessment.

Herein, we described the implementation of our PRS approach as a standalone tool that provides end users with the option of importing data from different microarray platforms and species. The tool yields both PRS and z-scores, provides statistical analysis, and allows browsing of pathways with impacted genes highlighted in different colours. An enhancement from our original report is that users are able to enrich both signalling and metabolic pathways.

Implementation

The PRS approach was implemented in MATLAB. Users without access to the MATLAB environment can down-load the MATLAB Runtime Compiler (MRC) in order to deploy the software described herein, via a user-friendly GUI. The PRS interface (Figure 1) provides users with several functions:

Preprocessing microarray data

We did not re-engineer a filter to normalise data from a variety of platforms, rather users must first preprocess transcriptomic data using one of the myriad existing tools. Data must be in the form of a simple Excel spreadsheet, in which the first column should be probe ID, and the following columns normalised replicated expression values from the control and test conditions. Additional information regarding species, sample numbers, fold-change and t-test thresholds, normalisation method and platform is required.

Pathway representation

Our fundamental algorithm was described previously [27]. Briefly, Kyoto Encyclopaedia of Genes and Genomes pathway definitions [28] were used, in which pathways are maintained in KEGG Mark-up Language (KGML) format. We imported a total of 189 signalling and metabolic descriptions from KEGG and parsed these into MATLAB objects, which were then converted into directed graphs. KGML files contain three types of objects: entries, relations, and reactions. These can be mapped to graphical objects in the associated pathway map (Additional file 1). Only entries (which form nodes, represented as boxes) and relations (represented as edges) were used to represent signalling pathways where proteins (boxes) are linked by “relations”. All three types are used to represent the structure of metabolic pathways in order to capture substrate-enzyme-product relationships where enzymes (boxes) are linked by “relations”, and compounds (circles) are linked by “reactions”. To convert a metabolic pathway into a graph in a rational way, we represented enzymes as nodes in the graph, while substrates and products were used to detect the direction of relations (edges) between nodes (Figure 2). While we acknowledge that is not possible to predict any effect on flux by this rationale, we reasoned that any change in node expression in a metabolic pathway could be of physiological relevance, particularly if nodes were connected.

Representing pathways as graphs had an additional advantage as it reduced redundancy in that genes were only represented once in any pathway graph. A Depth-First Search (DFS) algorithm, derived from Graph Theory was used to ensure that loops were only counted once.

Pathway scoring

Our method assigned weights to all significant nodes (i.e. DEGs) in a pathway to reflect their topological strength (specifically the number of significant downstream nodes that are pointed to, either directly or via other significant nodes as described previously [27]). A PRS was calculated on the basis of fold-change value and weighting of all significant nodes in the pathway and normalized for pathway size. We also calculated a z-score [29] (with an improvement over earlier implementations in that this was performed after removing redundant genes from pathway descriptions). The software outputs two lists of pathways ranked according to PRS and z-score, saved as both Excel and .mat files for later analysis.

Pathway significance assessment

We then went on to establish the probability of achieving scores at least as high as the PRS score by chance using a non-parametric permutation method. Initially, fold-change values for all expressed microarray genes were permuted. These values were then mapped back onto pathways, and a PRS recalculated. This process was repeated n times, where n is provided by the user through the interface (typically n = 1000). The statistical significance (p-value) of each pathway score was estimated by a comparison between the observed score and the n random scores generated. To achieve more reliable statistical significance evaluation, p-values were adjusted for multiple-test correction by a False Discovery Rate (FDR) method based on a threshold provided by the user. This is described in more detail in our original report [27].

Visualizing enriched pathways

After running the analysis, results are saved as .mat format files for ease of retrieval. By clicking on the pathway name from the list of ranked pathways shown in the table and selecting the option of visualizing a pathway from the interface, a marked-up pathway map will be displayed. Technically, the software will call a pathway mapping web service (REST-based API service) hosted on the KEGG website and pass a number of parameters, including a list of all expressed genes with their fold- changes and specified colours to differentiate DEGs from non-impacted genes. Figure 3 shows a typical pathway map where significant (i.e. above threshold) genes are coloured in red and non-significant (i.e. unchanged or not expressed) in green.

UML for modelling and software description

Herein, we used Unified Modelling Language (UML) to describe, model and visualize the structure and functions of our method by diagrams. There are 14 types of diagrams classified in three categories in UML 2.0 [30], however, in this paper we used only two: class and sequence diagrams. Class diagrams represent static structures or main objects in the software. Figure 4 shows the key classes at the pathway analysis stage. The class “Analysis” is the main class, which provides an interface to run all the services provided by the tool. It has four main attributes:

▪ MicroarrayObject: an object of the class “Microarray_Dataset” built by calling initialiseMicroarray() function (see Additional file 2). This holds the normalised gene expression data, and a list of all genes with their fold-change values.
▪ kgmlObject: an object of the class “KGML_Parser” built by calling the parseKGML() function (see Additional file 3). This holds the static structure of all pathways as a list of objects of “KGML_Path” class that is defined by KGML format. An object of “KGML_Path” represents the structure of one KEGG pathway and is composed of entriesList, reactionsList, and relationsList (see Additional file 1).
▪ PathList: this is a list of objects of the class “Pathway” which is created by calling CreatePathListFromKegg() function (see Additional file 4). This object ultimately holds a list of pathways enriched with reference to a given microarray dataset.
▪ rankedPaths: this object is created by calling the rankPaths() function. It holds the same list of pathways defined by PathsList, but they are ranked in descending order based on PRS values.

Sequence diagrams were used to represent the functions of the PRS tool according to different types of interactions between objects. As an example, Figure 5 represents the main PRS functions with the following steps:

i.
Conversion of pathways into graphs by the convertPath2Graph() function, which requires the usage of kgmlObject that holds a list of entries, relations and reactions of all pathways.
ii.
Using information stored in kgmlObject and PathsList for each graph (see Figure 4), a list of nodes is created (where each node represents one or more genes from the original pathway) and a list of children for each node.
iii.
Removal of redundant genes, which may be represented many times in the same pathway. Two functions are designed to deal with node redundancy: checkNodeRedundancy() and handleNodeRedundancy().
iv.
After building a graph for each pathway, graphs are weighted by calling the createWeightedGraphs() function, which uses the DFS algorithm to traverse the nodes of each graph and assign a weight for each significant node taking into account the loops in the graph.
v.
A pathway regulation score (PRS) is assigned to each weighted graph using the weights of the significant nodes in the graph and other parameters.

We implemented all these classes, functions, and DFS algorithm using MATLAB R2010a.

Results and discussion

The objective evaluation of novel enrichment analysis methods is difficult, relying on their ability to discern biological processes already known to be perturbed in disease states. We and others previously attempted this by studying performance across a range of datasets derived from distinct conditions ([27] and references therein). Having extended our algorithm to include biochemical pathways, we performed further analysis on a dataset describing a common metabolic disorder, that of type 2 diabetes mellitus (T2DM). The data were originally created by Taneera et al [31], who compared gene expression levels in RNA isolated from human pancreatic islets taken from 9 type 2 diabetes (T2D) cadaver donors with RNA samples of pancreatic islets derived from 54 non-diabetic cadaver donors. These were hybridised to Affymetrix Human Gene 1.0 ST Arrays, and resulting expression values normalised by Robust Multi-array Analysis (RMA) before being uploaded to the Gene Expression Omnibus (http://www.ncbi.nlm.nih.gov/geo; accession #GDS4337). We created an input file containing Affymetrix probe IDs and normalized gene expression data for each of the 63 samples. Other parameters required were sample numbers in each group (9 in group1, 54 in group2 in this case), and fold-change and p-value threshold values to filter significant genes (in this case fold-change ≥1.3 and p-value <0.05). Fold-change thres-holds are arbitrary, and the value selected in this example yielded a sufficient number of impacted genes to allow pathway mapping (in this example, a threshold of 1.5 would have yielded only 88 DEGs). The user can opt to enrich for signalling or metabolic pathways, or both (as in this example). Additional statistical testing can be performed, if required, by our permutation method (in this example we used number of permutations = 1000 and p-value threshold = 0.05). Tables 1 and 2 display the top ten pathways ranked according to PRS and z-scores respectively, where only significant pathways (FDR < 0.05) were selected. A number of processes relevant to T2DM were picked up by both techniques, notably metabolic pathways such as “Arachidonic acid metabolism” [32] and “Fatty acid metabolism” [33],[34], as well as anticipated signalling processes such as “PPAR signalling pathway”[35],[36]. Both techniques detect “Pathways in cancer”, which is unsurprising as this description encompasses a number of processes perturbed in diabetes including apoptosis and the cell cycle, along with TGF-beta signalling [37]. “Complement and coagulation cascades” scored highly with both methods, which could be a false positive or may reflect alterations to the vasculature in diabetic islets. Apart from this exception, all other high-scoring PRS pathways are known to be impacted in diabetic states. Conversely, a number of pathways detected by z-scoring are harder to explain, and so may also be false positives (“Intestinal immune network”, “Cell adhesion molecules”, “Allograft rejection”, “Staphylococcus aureus infection”). Finally, the PRS method afforded greater prominence to two pathways critical to T2DM, “MAPK signalling” [38] and “Type II diabetes mellitus” [39], compared to z-scoring. Indeed, the latter description explicitly reflects the impact on adipocytokine and insulin signalling, which are central to the pathophysiology of diabetes.

Table 1 Top ten pathways ranked by PRS (T2D and pancreatic islets dataset)

Full size table

Table 2 Top ten pathways ranked by Z-score (T2D and pancreatic islets dataset)

Full size table

Conclusions

The rapid development of high-throughput genomic technologies and the deposition of their output in open-access databases has produced huge amounts of biolo-gical data. Mining and interpreting these data has driven innovation in the field of computational biology, leading to the emergence of sophisticated tools to produce reliable, meaningful and testable results. This is important as these kinds of experiments are expensive, and new tools are likely to add value to pre-existing analysis.

In this paper, we address two areas; firstly, the extension of our PRS enrichment algorithm [27] to include both metabolic and signalling pathways; and secondly, to provide a detailed description of a GUI that facilitates array analysis by both PRS and z-scoring. The improved tool handles a number of challenges, notably in ID mapping, redundancy in pathway descriptions and statistical significance assessment. Unlike z-scoring, the PRS algorithm takes into account the topology of a pathway (the relationships between genes) and the magnitude of gene expression changes to identify impacted pathways. For these reasons, we argue that PRS enrichment yields more biologically-relevant insights compared to those provided by the standard hypergeometric method. It was not feasible to compare performance to other PT methods as the additional preprocessing steps taken to reduce redundancy in KEGG descriptions are not easily implemented in other methods without considerable re-engineering. The behaviour of signalling and metabolic pathways is, of course, distinct. However, as our approach was to assess transcriptional changes in a pathway, rather than to predict an effect on the function of a pathway, we felt it was reasonable to evaluate impact on signalling and biochemical pathways using a single method. In this way, we were able to detect biochemical pathways known to be perturbed in metabolic disease. A key tenet of this kind of analysis is that biomedical scientists are guided in the subsequent investigation of targets revealed by transcriptional profiling studies. Unfortunately, there is no unambiguous statistical test that allows investigators to be certain that any pathway highlighted is worthy of further study (and considerable expense). The use of permutation-based approaches are commonly used to determine the likelihood of an enrichment score being achieved by chance, and by adjusting P values by FDR can increase investigators’ confidence that a result is meaningful.

In summary, we suggest that providing researchers with a choice of analysis tools, informed by distinct rationales, will allow evidence to be combined or contrasted in order to facilitate more informed decision making.

Availability and requirements

Project name: PRS_software.

Project home page: http://www.buckingham.ac.uk/research/clore-laboratory-diabetes-obesity-and-metabolic-research/staff/maysson-al-haj-ibrahim/prs-tool/.

Operating system(s): Platform independent.

Programming language: MATLAB.

Other requirements: MATLAB 2010a or higher. If MATLAB is not installed on your PC, you need to install the MCR (Matlab Compiler Runtime) environment first and then run the PRS tool.

Restrictions for use: None.

Authors' contributions

MAI conceived the method, generated the code and performed the testing. SJ provided guidance in the use of the DFS algorithm and assisted with statistical analysis. MAC provided invaluable insights during the development process. KL developed the algorithm in collaboration with MAI and assisted with the biological analysis. All authors were involved in preparing the manuscript, and all approved the final draft.

Additional files

References

Glazko GV, Emmert-Streib F: Unite and conquer: univariate and multivariate approaches for finding differentially expressed gene sets. Bioinformatics. 2009, 25 (18): 2348-2354. 10.1093/bioinformatics/btp406.
Article PubMed Central PubMed CAS Google Scholar
Khatri P, Sirota M, Butte AJ: Ten Years of Pathway Analysis: Current Approaches and Outstanding Challenges. PLoS Comput Biol. 2012, 8 (2): e1002375-10.1371/journal.pcbi.1002375.
Article PubMed Central PubMed CAS Google Scholar
Dahlquist KD, Salomonis N, Vranizan K, Lawlor SC, Conklin BR: GenMAPP, a new tool for viewing and analyzing microarray data on biological pathways. Nat Genet. 2002, 31 (1): 19-20. 10.1038/ng0502-19.
Article PubMed CAS Google Scholar
Zeeberg BR, Feng W, Wang G, Wang MD, Fojo AT, Sunshine M, Narasimhan S, Kane DW, Reinhold WC, Lababidi S, Bussey KJ, Riss J, Barrett JC, Weinstein JN: GoMiner: a resource for biological interpretation of genomic and proteomic data. Genome Biol. 2003, 4 (4): R28-10.1186/gb-2003-4-4-r28.
Article PubMed Central PubMed Google Scholar
Khatri P, Draghici S, Ostermeier GC, Krawetz SA: Profiling Gene Expression Using Onto-Express. Genomics. 2002, 79 (2): 266-270. 10.1006/geno.2002.6698.
Article PubMed CAS Google Scholar
Al-Shahrour F, Díaz-Uriarte R, Dopazo J: FatiGO: a web tool for finding significant associations of Gene Ontology terms with groups of genes. Bioinformatics. 2004, 20 (4): 578-580. 10.1093/bioinformatics/btg455.
Article PubMed CAS Google Scholar
Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, Paulovich A, Pomeroy SL, Golub TR, Lander ES, Mesirov JP: Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A. 2005, 102 (43): 15545-10.1073/pnas.0506580102.
Article PubMed Central PubMed CAS Google Scholar
Tian L, Greenberg SA, Kong SW, Altschuler J, Kohane IS, Park PJ: Discovering statistically significant pathways in expression profiling studies. Proc Natl Acad Sci USA. 2005, 102 (38): 13544-10.1073/pnas.0506577102.
Article PubMed Central PubMed CAS Google Scholar
Barry WT, Nobel AB, Wright FA: Significance analysis of functional categories in gene expression studies: a structured permutation approach. Bioinformatics. 2005, 21 (9): 1943-1949. 10.1093/bioinformatics/bti260.
Article PubMed CAS Google Scholar
Boorsma A, Foat BC, Vis D, Klis F, Bussemaker HJ: T-profiler: scoring the activity of predefined groups of genes using gene expression data. Nucleic Acids Res. 2005, 33: W592-W595. 10.1093/nar/gki484.
Article PubMed Central PubMed CAS Google Scholar
Kim S-B, Yang S, Kim S-K, Kim SC, Woo HG, Volsky DJ, Kim S-Y, Chu I-S: GAzer: gene set analyzer. Bioinformatics. 2007, 23 (13): 1697-1699. 10.1093/bioinformatics/btm144.
Article PubMed CAS Google Scholar
Backes C, Keller A, Kuentzer J, Kneissl B, Comtesse N, Elnakady YA, Muller R, Meese E, Lenhof H-P: GeneTrail-advanced gene set enrichment analysis. Nucleic Acids Res. 2007, 35: W186-W192. 10.1093/nar/gkm323.
Article PubMed Central PubMed Google Scholar
Mitrea C, Taghavi Z, Bokanizad B, Hanoudi S, Tagett R, Donato M, Voichiţa C, Drăghici S: Methods and approaches in the topology-based analysis of biological pathways. Front Physiol. 2013, 4: 278-10.3389/fphys.2013.00278.
Article PubMed Central PubMed Google Scholar
MetaCore™: [], [http://thomsonreuters.com/metacore/]
Glaab E, Baudot A, Krasnogor N, Schneider R, Valencia A: EnrichNet: network-based gene set enrichment analysis. Bioinformatics. 2012, 28 (18): i451-i457. 10.1093/bioinformatics/bts389.
Article PubMed Central PubMed CAS Google Scholar
Amin K: Pathway-express: A Bioinformatics Tool for Pathway Level Analysis Using Gene Expression Data. 2007
Google Scholar
Fang Z, Tian W, Ji H: A network-based gene-weighting approach for pathway analysis. Cell Res. 2012, 22 (3): 565-580. 10.1038/cr.2011.149.
Article PubMed Central PubMed CAS Google Scholar
Greenblum SI, Efroni S, Schaefer CF, Buetow KH: The PathOlogist: an automated tool for pathway-centric analysis. BMC Bioinformatics. 2011, 12 (1): 133-10.1186/1471-2105-12-133.
Article PubMed Central PubMed Google Scholar
Jacob L, Neuvial P, Dudoit S: Gains in power from structured two-sample tests of means on graphs. 2010
Google Scholar
Mieczkowski J, Swiatek-Machado K, Kaminska B: Identification of Pathway Deregulation - Gene Expression Based Analysis of Consistent Signal Transduction. PLoS One. 2012, 7 (7): e41541-10.1371/journal.pone.0041541.
Article PubMed Central PubMed CAS Google Scholar
Khatri P, Voichita C, Kattan K, Ansari N, Khatri A, Georgescu C, Tarca AL, Draghici S: Onto-Tools: new additions and improvements in 2006. Nucleic Acids Res. 2007, 35: W206-W211. 10.1093/nar/gkm327.
Article PubMed Central PubMed Google Scholar
Shojaie A, Michailidis G: Analysis of Gene Sets Based on the Underlying Regulatory Network. J Comput Biol. 2009, 16 (3): 407-426. 10.1089/cmb.2008.0081.
Article PubMed Central PubMed CAS Google Scholar
Rahnenführer J, Domingues FS, Maydt J, Lengauer T: Calculating the Statistical Significance of Changes in Pathway Activity From Gene Expression Data. Stat Appl Genet Mol Biol. 2004, 31: 1544-6115.
Google Scholar
Gao S, Wang X: TAPPA: topological analysis of pathway phenotype association. Bioinformatics. 2007, 23 (22): 3100-3102. 10.1093/bioinformatics/btm460.
Article PubMed Central PubMed CAS Google Scholar
Xia J, Wishart DS: MetPA: a web-based metabolomics tool for pathway analysis and visualization. Bioinformatics. 2010, 26 (18): 2342-2344. 10.1093/bioinformatics/btq418.
Article PubMed CAS Google Scholar
Martini P, Sales G, Massa MS, Chiogna M, Romualdi C: Along signal paths: an empirical gene set approach exploiting pathway topology. Nucleic Acids Res. 2012, 41 (1): e19-10.1093/nar/gks866.
Article PubMed Central PubMed Google Scholar
Ibrahim MA, Jassim S, Cawthorne MA, Langlands K: A Topology-Based Score for Pathway Enrichment. J Comput Biol. 2012, 19 (5): 563-573. 10.1089/cmb.2011.0182.
Article PubMed Google Scholar
Kyoto Encyclopaedia of Genes and Genomes, data retrieved May 2012 from [], [http://www.genome.jp/kegg/]
Cheadle C, Vawter MP, Freed WJ, Becker KG: Analysis of Microarray Data Using Z Score Transformation. J Mol Diagn. 2003, 5 (2): 73-81. 10.1016/S1525-1578(10)60455-2.
Article PubMed Central PubMed CAS Google Scholar
Unified Modeling Language™ (UML®): [], [http://www.uml.org/]
Taneera J, Lang S, Sharma A, Fadista J, Zhou Y, Ahlqvist E, Jonsson A, Lyssenko V, Vikman P, Hansson O, Parikh H, Korsgren O, Soni A, Krus U, Zhang E, Jing X-J, Esguerra JLS, Wollheim CB, Salehi A, Rosengren A, Renström E, Groop L: A Systems Genetics Approach Identifies Genes and Pathways for Type 2 Diabetes in Human Islets. Cell Metab. 2012, 16 (1): 122-134. 10.1016/j.cmet.2012.06.006.
Article PubMed CAS Google Scholar
Persaud SJ, Muller D, Belin VD, Kitsou-Mylona I, Asare-Anane H, Papadimitriou A, Burns CJ, Huang GC, Amiel SA, Jones PM: The Role of Arachidonic Acid and Its Metabolites in Insulin Secretion From Human Islets of Langerhans. Diabetes. 2007, 56 (1): 197-203. 10.2337/db06-0490.
Article PubMed CAS Google Scholar
Yaney GC, Corkey BE: Fatty acid metabolism and insulin secretion in pancreatic beta cells. Diabetologia. 2003, 46 (10): 1297-1312. 10.1007/s00125-003-1207-4.
Article PubMed CAS Google Scholar
McGarry JD: Banting lecture 2001 Dysregulation of fatty acid metabolism in the etiology of type 2 diabetes. Diabetes. 2002, 51 (1): 7-18. 10.2337/diabetes.51.1.7.
Article PubMed CAS Google Scholar
Sugden MC, Holness MJ: Potential Role of Peroxisome Proliferator-Activated Receptor-α in the Modulation of Glucose-Stimulated Insulin Secretion. Diabetes. 2004, 53 (1): S71-S81. 10.2337/diabetes.53.2007.S71.
Article PubMed CAS Google Scholar
Kim H-S, Hwang Y-C, Koo S-H, Park KS, Lee M-S, Kim K-W, Lee M-K: PPAR-γ Activation Increases Insulin Secretion through the Up-regulation of the Free Fatty Acid Receptor GPR40 in Pancreatic β-Cells. PLoS One. 2013, 8 (1): e50128-10.1371/journal.pone.0050128.
Article PubMed Central PubMed CAS Google Scholar
Prentki M, Nolan CJ: Islet cell failure in type 2 diabetes. J Clin Invest. 2006, 116 (7): 1802-1812. 10.1172/JCI29103.
Article PubMed Central PubMed CAS Google Scholar
Tomas A, Yermen B, Min L, Pessin JE, Halban PA: Regulation of pancreatic β-cell insulin secretion by actin cytoskeleton remodelling role of gelsolin and cooperation with the MAPK signalling pathway. J Cell Sci. 2006, 119 (10): 2156-2167. 10.1242/jcs.02942.
Article PubMed CAS Google Scholar
Tanizawa Y, Riggs AC, Chiu KC, Janssen RC, Bell DS, Go RPC, Roseman JM, Acton MT, Permutt MA: Variability of the pancreatic islet beta cell/liver (GLUT 2) glucose transporter gene in NIDDM patients. Diabetologia. 1994, 37 (4): 420-427. 10.1007/BF00408481.
Article PubMed CAS Google Scholar

Download references

Acknowledgments

We wish to thank Dr Madhumita Das, a Master's student in our group, for her feedback on the tool. We also would like to thank the Buckingham Institute of Translational Medicine for funding this research.

Author information

Authors and Affiliations

Department of Applied Computing, the University of Buckingham, Buckingham, MK18 1EG, UK
Maysson Ibrahim & Sabah Jassim
The Buckingham Institute for Translational Medicine, the University of Buckingham, Buckingham, MK18 1EG, UK
Maysson Ibrahim, Michael Anthony Cawthorne & Kenneth Langlands

Authors

Maysson Ibrahim
View author publications
You can also search for this author in PubMed Google Scholar
Sabah Jassim
View author publications
You can also search for this author in PubMed Google Scholar
Michael Anthony Cawthorne
View author publications
You can also search for this author in PubMed Google Scholar
Kenneth Langlands
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Maysson Ibrahim.

Additional information

Competing interests

The authors declare that they have no competing interests.

Electronic supplementary material

Additional file 1: Objects forming KEGG pathways represented in a KGML file. (PDF 134 KB)

12859_2014_358_MOESM2_ESM.pdf

Additional file 2: UML sequence diagram representing the implementation of the “initialise” microarray function. (PDF 128 KB)

Additional file 3: UML sequence diagram of the parse KGML function. (PDF 7 KB)

12859_2014_358_MOESM4_ESM.pdf

Additional file 4: UML sequence diagram representing the implementation details of the process of creating the list of pathways from KEGG and mapping microarray data onto them. (PDF 35 KB)

Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

Authors’ original file for figure 1

Authors’ original file for figure 2

Authors’ original file for figure 3

Authors’ original file for figure 4

Authors’ original file for figure 5

Rights and permissions

This article is published under an open access license. Please check the 'Copyright Information' section either on this page or in the PDF for details of this license and what re-use is permitted. If your intended use exceeds what is permitted by the license or if you are unable to locate the licence and re-use information, please contact the Rights and Permissions team.

About this article

Cite this article

Ibrahim, M., Jassim, S., Cawthorne, M.A. et al. A MATLAB tool for pathway enrichment using a topology-based pathway regulation score. BMC Bioinformatics 15, 358 (2014). https://doi.org/10.1186/s12859-014-0358-2

Download citation

Received: 30 June 2014
Accepted: 22 October 2014
Published: 04 November 2014
DOI: https://doi.org/10.1186/s12859-014-0358-2

A MATLAB tool for pathway enrichment using a topology-based pathway regulation score

Abstract

Background

Results

Conclusions

Similar content being viewed by others

Background

Implementation

Preprocessing microarray data

Pathway representation

Pathway scoring

Pathway significance assessment

Visualizing enriched pathways

UML for modelling and software description

Results and discussion

Conclusions

Availability and requirements

Authors' contributions

Additional files

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Competing interests

Electronic supplementary material

Authors’ original submitted files for images

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation