Functional Interpretation of Gene Sets: Semantic-Based Clustering of Gene Ontology Terms on the BioTest Platform

  • Aleksandra Gruca
  • Roman Jaksik
  • Krzysztof Psiuk-Maksymowicz
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 659)


Modern high-throughput technologies based on genome, transcriptome or proteome profiling provide abundance of data that needs to be processed, analyzed and, finally, interpreted. Effective and efficient analysis of data coming from molecular profiling is crucial for a detailed diagnosis, prognosis, and prediction of therapy outcome. Meaningful conclusions can be drawn only by the use of sophisticated methods for biomedical and molecular data analysis and interpretation. In this study we present the approach for functional interpretation of gene or protein sets with clusters of Gene Ontology terms. We analyze transcription profiles of human cell line K562 and we show that clustering allows grouping functionally related GO terms and therefore obtaining more concise and comprehensive description. By applying cluster-specific data aggregation tool we are able to calculate statistics for the individual clusters of GO terms and compare the number of differentially expressed genes between two sample pairs. The presented tool is implemented as a part of annotation module available on the BioTest remote platform for hypothesis testing and analysis of biomedical data.


Gene Ontology Clustering Semantic similarity BioTest platform DNA microarrays Molecular profiling Functional interpretation 



This work was partially supported by The National Centre for Research and Development grant No PBS3/B3/32/2015 and was carried out in part within the statutory research project of the Institute of Informatics (RAU2). Presented system was developed and installed on the infrastructure of the Ziemowit computer cluster ( in the Laboratory of Bioinformatics and Computational Biology, The Biotechnology, Bioengineering and Bioinformatics Centre Silesian BIO-FARMA, created in the POIG.02.01.00-00-166/08 and expanded in the POIG.02.03.01-00-040/13 projects.


  1. 1.
    Afgan, E., et al.: The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2016 update. Nucleic Acids Res. 44(W1), gkw343 (2016)CrossRefGoogle Scholar
  2. 2.
    Ashburner, M., et al.: Gene Ontology: tool for the unification of biology. Nat. Genet. 25(1), 25–29 (2000)CrossRefGoogle Scholar
  3. 3.
    Bensz, W., et al.: Integrated system supporting research on environment related cancers. In: Król, D., Madeyski, L., Nguyen, N.T. (eds.) Recent Developments in Intelligent Information and Database Systems, SCI, vol. 642, pp. 399–409. Springer, Cham (2016)CrossRefGoogle Scholar
  4. 4.
    Biggs, J.R., Kraft, A.S.: Myeloid cell differentiation. In: eLS. John Wiley and Sons Ltd., Hoboken (2001)Google Scholar
  5. 5.
    Birkland, A., Yona, G.: BIOZON: a system for unification, management and analysis of heterogeneous biological data. BMC Bioinf. 7, 70 (2006)CrossRefGoogle Scholar
  6. 6.
    Carmona-Saez, P., et al.: Integrated analysis of gene expression by association rules discovery. BMC Bioinf. 7(9), 54 (2006)CrossRefGoogle Scholar
  7. 7.
    Chow, M.T., Luster, A.D.: Chemokines in cancer. Cancer Immunol. Res. 2(12), 1125–1131 (2014)CrossRefGoogle Scholar
  8. 8.
    Dai, M., et al.: Evolving gene/transcript definitions significantly alter the interpretation of GeneChip data. Nucleic Acids Res. 33(20), e175 (2005)CrossRefGoogle Scholar
  9. 9.
    Do, L.H., Esteves, F., Karten, H., Bier, E.: Booly: a new data integration platform. BMC Bioinf. 11, 513 (2010)CrossRefGoogle Scholar
  10. 10.
    Falcon, S., Gentleman, R.: Using GOstats to test gene lists for GO term association. Bioinformatics 23(2), 257–258 (2007)CrossRefGoogle Scholar
  11. 11.
    Fulda, S., Gorman, A.M., Hori, O., Samali, A.: Cellular stress responses: cell survival and cell death. Int. J. Cell Biol. 2010, 23 (2010). Article no. 214074Google Scholar
  12. 12.
    Gomez-Cabrero, D., et al.: Data integration in the era of omics: current and future challenges. BMC Syst. Biol. 8(Suppl 2), I1 (2014)CrossRefGoogle Scholar
  13. 13.
    Gruca, A., Kozielski, M., Sikora, M.: Fuzzy clustering and Gene Ontology based decision rules for identification and description of gene groups. In: Cyran, K.A., Kozielski, S., Peters, J.F., Stańczyk, U., Wakulicz-Deja, A. (eds.) Man-Machine Interactions, AINSC, vol. 59, pp. 141–149. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  14. 14.
    Gruca, A., Sikora, M.: Data- and expert-driven rule induction and filtering framework for functional interpretation and description of gene sets. J. Biomed. Semant. 8(1), 23 (2017)CrossRefGoogle Scholar
  15. 15.
    Gruca, A., Sikora, M., Polanski, A.: RuleGO: a logical rules-based tool for description of gene groups by means of Gene Ontology. Nucleic Acids Res. 39(Web Server issue), W293–W301 (2011)CrossRefGoogle Scholar
  16. 16.
    Huang, D.W., et al.: DAVID Bioinformatics Resources: Expanded annotation database and novel algorithms to better extract biology from large gene lists. Nucleic Acids Res. 35(Web Server issue), W169–W175 (2007)CrossRefGoogle Scholar
  17. 17.
    Jiang, J.J., Conrath, D.W.: Semantic similarity based on corpus statistics and lexical taxonomy. In: ROCLING X 1997, pp. 19–33, Taiwan (1997)Google Scholar
  18. 18.
    Kozielski, M., Gruca, A.: Soft approach to identification of cohesive clusters in two gene representations. Procedia Comput. Sci. 35(C), 281–289 (2014)CrossRefGoogle Scholar
  19. 19.
    Lan, C., Chen, Q., Li, J.: Grouping miRNAs of similar functions via weighted information content of Gene Ontology. BMC Bioinf. 17(19), 507 (2016)CrossRefGoogle Scholar
  20. 20.
    Lin, D.: An information-theoretic definition of similarity. In: ICML 1998, pp. 296–304 (1998)Google Scholar
  21. 21.
    Linger, J.G., Tyler, J.K.: Chromatin disassembly and reassembly during DNA repair. Mutat. Res. - Fundam. Mol. Mech. Mutagen. 618(1–2), 52–64 (2007)CrossRefGoogle Scholar
  22. 22.
    Maere, S., Heymans, K., Kuiper, M.: BiNGO: a cytoscape plugin to assess overrepresentation of Gene Ontology categories in biological networks. Bioinformatics 21(16), 3448–3449 (2005)CrossRefGoogle Scholar
  23. 23.
    Masseroli, M., Canakoglu, A., Ceri, S.: Integration and querying of genomic and proteomic semantic annotations for biomedical knowledge extraction. IEEE/ACM Trans. Comput. Biol. Bioinf. 13(2), 209–219 (2016)CrossRefGoogle Scholar
  24. 24.
    Ovaska, K., Laakso, M., Hautaniemi, S.: Fast Gene Ontology based clustering for microarray experiments. BioData Min. 1(1), 11 (2008)CrossRefGoogle Scholar
  25. 25.
    Pesquita, C., et al.: Semantic similarity in biomedical ontologies. PLoS Comput. Biol. 5(7), e1000443 (2009)MathSciNetCrossRefGoogle Scholar
  26. 26.
    Psiuk-Maksymowicz, K., et al.: A holistic approach to testing biomedical hypotheses and analysis of biomedical data. In: Kozielski, S., Mrozek, D., Kasprowski, P., Małysiak-Mrozek, B., Kostrzewa, D. (eds.) Beyond Databases, Architectures and Structures. Advanced Technologies for Data Mining and Knowledge Discovery. CCIS, vol. 613, pp. 449–462. Springer, Cham (2016)CrossRefGoogle Scholar
  27. 27.
    Psiuk-Maksymowicz, K., et al.: Scalability of a genomic data analysis in the biotest platform. In: Nguyen, N., Tojo, S., Nguyen, L., Trawinśki, B. (eds.) Intelligent Information and Database Systems. LNCS, vol. 10192, pp. 741–752. Springer, Cham (2017)CrossRefGoogle Scholar
  28. 28.
    Resnik, P.: Using information content to evalutate semantic similarity in a taxonomy. In: IJCAI 1995, vol. 1, pp. 448–453, Montreal, Canada (1995)Google Scholar
  29. 29.
    Ritchie, M.D., et al.: Methods of integrating data to uncover genotype-phenotype interactions. Nat. Rev. Genet. 16(2), 85–97 (2015)CrossRefGoogle Scholar
  30. 30.
    Schoenborn, J., Wilson, C.: Regulation of interferon-\(\gamma \) during innate and adaptive immune responses. Adv. Immunol. 96(96), 41–101 (2007)CrossRefGoogle Scholar
  31. 31.
    Speer, N., et al.: Spectral clustering Gene Ontology terms to group genes by function. In: Casadio, R., Myers, G. (eds.) Algorithms in Bioinformatics. LNCS, vol. 3692, pp. 1–12. Springer, Berlin, Heidelberg (2005)CrossRefGoogle Scholar
  32. 32.
    Wang, J.Z., et al.: A new method to measure the semantic similarity of GO terms. Bioinformatics 23(10), 1274–1281 (2007)CrossRefGoogle Scholar
  33. 33.
    Yu, G., et al.: GOSemSim: an R package for measuring semantic similarity among GO terms and gene products. Bioinformatics 26(7), 976–978 (2010)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2018

Authors and Affiliations

  • Aleksandra Gruca
    • 1
  • Roman Jaksik
    • 2
  • Krzysztof Psiuk-Maksymowicz
    • 2
  1. 1.Institute of InformaticsSilesian University of TechnologyGliwicePoland
  2. 2.Institute of Automatic ControlSilesian University of TechnologyGliwicePoland

Personalised recommendations