Integrating Multiple-Platform Expression Data through Gene Set Features

  • Matěj Holec
  • Filip Železný
  • Jiří Kléma
  • Jakub Tolar
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5542)


We demonstrate a set-level approach to the integration of multiple platform gene expression data for predictive classification and show its utility for boosting classification performance when single- platform samples are rare. We explore three ways of defining gene sets, including a novel way based on the notion of a fully coupled flux related to metabolic pathways. In two tissue classification tasks, we empirically show that the gene set based approach is useful for combining heterogeneous expression data, while surprisingly, in experiments constrained to a single platform, biologically meaningful gene sets acting as sample features are often outperformed by random gene sets with no biological relevance.


Heterogeneous Platform Expression Sample Random Subspace Method Pathway Activation Level Gene Base Representation 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bild, A., Febbo, P.G.: Application of a priori established gene sets to discover biologically important differential expression in microarray data. PNAS 102(43), 15278–15279 (2005)CrossRefPubMedPubMedCentralGoogle Scholar
  2. 2.
    Bolstad, B.M., Irizarry, R.A., Astrand, M., Speed, T.P.: A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics 19(2), 185–193 (2003)CrossRefPubMedGoogle Scholar
  3. 3.
    The Gene Ontology Consortium. Gene ontology: tool for the unification of biology. Nature Genetics 25 (2000)Google Scholar
  4. 4.
    Gentleman, R.C., Carey, V.J., Bates, D.M., et al.: Bioconductor: Open software development for computational biology and bioinformatics. Genome Biology 5, R80 (2004)CrossRefGoogle Scholar
  5. 5.
    Goeman, J., Bühlmann, P.: Analyzing gene expression data in terms of gene sets: methodological issues. Bioinformatics 23(8), 980–987 (2007)CrossRefPubMedGoogle Scholar
  6. 6.
    Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  7. 7.
    Ho, T.K.: The random subspace method for constructing decision forests. IEEE Transactions on Pattern Analysis and Machine Intelligence 20(8), 832–844 (1998)CrossRefGoogle Scholar
  8. 8.
    Holec, M., Zelezny, F., Klema, J., et al.: Using bio-pathways in relational learning. In: Late Breaking Papers, 18th International Conference on Inductive Logic Programming (ILP 2008) (2008)Google Scholar
  9. 9.
    Huang, D.W., Sherman, B.T., Lempick, R.A.: Systematic and integrative analysis of large gene lists using david bioinformatics resources. Nature Protocols 4, 44–57 (2009)CrossRefGoogle Scholar
  10. 10.
    Kanehisa, M., Goto, S., Kawashima, S., Okuno, Y., Hattori, M.: The KEGG resource for deciphering the genome. Nucleic Acids Res. 32, 277–280 (2004)CrossRefGoogle Scholar
  11. 11.
    Mootha, V.K., Lindgren, C., Laureta, S., et al.: Pgc-1-alpha-responsive genes involved in oxidative phosphorylation are coorinately down regulated in human diabetes. Nature Genetics 34, 267–273 (2003)CrossRefPubMedGoogle Scholar
  12. 12.
    Nicolae, D.L., De la Cruz, O., Wen, W., Ke, B., Song, M.: Invited keynote talk: Set-level analyses for genome-wide association data. In: Măndoiu, I., Sunderraman, R., Zelikovsky, A. (eds.) ISBRA 2008. LNCS (LNBI), vol. 4983, p. 1. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  13. 13.
    Notebaart, R.A., Teusink, B., Siezen, R.J., Papp, B.: Co-regulation of metabolic genes is better explained by flux coupling than by network distance. PLOS Computational Biology 4(1) (2008)Google Scholar
  14. 14.
    Edgar, R., Domrachev, M., Lash, A.E.: Gene expression omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res. 30(1), 207–210 (2002)CrossRefPubMedPubMedCentralGoogle Scholar
  15. 15.
    Rapaport, F., Zinovyev, A., Dutreix, M., Barillot, E., Vert, J.-P.: Classification of microarray data using gene networks. BMC Bioinformatics 8, 35 (2007)CrossRefPubMedPubMedCentralGoogle Scholar
  16. 16.
    Shaw, A.S., Filbert, E.L.: Scaffold proteins and immune-cell signalling. Nat. Rev. Immunol. 9(1), 47–56 (2009)CrossRefPubMedGoogle Scholar
  17. 17.
    Stalteri, M.A., Harrison, A.P.: Interpretation of multiple probe sets mapping to the same gene in affymetrix genechips. BMC Bioinformatics 8, 13 (2007)CrossRefPubMedPubMedCentralGoogle Scholar
  18. 18.
    Tomfohr, J., Lu, J., Kepler, T.B.: Pathway level analysis of gene expression using singular value decomposition. BMC Bioinformatics 6 (2005)Google Scholar
  19. 19.
    Weichhart, T., Semann, M.D.: The PI3K/Akt/mTOR pathway in innate immune cells: emerging therapeutic applications. Ann Rheum Dis. suppl. 3, iii:70–74 (2008)Google Scholar
  20. 20.
    Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)Google Scholar
  21. 21.
    Sun, Y., Chen, J.: mTOR signaling: PLD takes center stage. Cell Cycle 7(20), 3118–3123 (2008)CrossRefPubMedGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Matěj Holec
    • 1
  • Filip Železný
    • 1
  • Jiří Kléma
    • 1
  • Jakub Tolar
    • 2
  1. 1.Czech Technical UniversityPragueCzech Republic
  2. 2.University of MinnesotaMinneapolisUSA

Personalised recommendations