Integrating Multiple-Platform Expression Data through Gene Set Features
We demonstrate a set-level approach to the integration of multiple platform gene expression data for predictive classification and show its utility for boosting classification performance when single- platform samples are rare. We explore three ways of defining gene sets, including a novel way based on the notion of a fully coupled flux related to metabolic pathways. In two tissue classification tasks, we empirically show that the gene set based approach is useful for combining heterogeneous expression data, while surprisingly, in experiments constrained to a single platform, biologically meaningful gene sets acting as sample features are often outperformed by random gene sets with no biological relevance.
KeywordsHeterogeneous Platform Expression Sample Random Subspace Method Pathway Activation Level Gene Base Representation
Unable to display preview. Download preview PDF.
- 3.The Gene Ontology Consortium. Gene ontology: tool for the unification of biology. Nature Genetics 25 (2000)Google Scholar
- 8.Holec, M., Zelezny, F., Klema, J., et al.: Using bio-pathways in relational learning. In: Late Breaking Papers, 18th International Conference on Inductive Logic Programming (ILP 2008) (2008)Google Scholar
- 13.Notebaart, R.A., Teusink, B., Siezen, R.J., Papp, B.: Co-regulation of metabolic genes is better explained by flux coupling than by network distance. PLOS Computational Biology 4(1) (2008)Google Scholar
- 18.Tomfohr, J., Lu, J., Kepler, T.B.: Pathway level analysis of gene expression using singular value decomposition. BMC Bioinformatics 6 (2005)Google Scholar
- 19.Weichhart, T., Semann, M.D.: The PI3K/Akt/mTOR pathway in innate immune cells: emerging therapeutic applications. Ann Rheum Dis. suppl. 3, iii:70–74 (2008)Google Scholar
- 20.Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)Google Scholar