Using Semantic Similarities and csbl.go for Analyzing Microarray Data

  • Kristian OvaskaEmail author
Part of the Methods in Molecular Biology book series (MIMB, volume 1375)


Cellular phenotypes result from the combined effect of multiple genes, and high-throughput techniques such as DNA microarrays and deep sequencing allow monitoring this genomic complexity. The large scale of the resulting data, however, creates challenges for interpreting results, as primary analysis often yields hundreds of genes. Gene Ontology (GO), a controlled vocabulary for gene products, enables semantic analysis of such gene sets. GO can be used to define semantic similarity between genes, which enables semantic clustering to reduce the complexity of a result set. Here, we describe how to compute semantic similarities and perform GO-based gene clustering using csbl.go, an R package for GO semantic similarity. We demonstrate the approach with expression profiles from breast cancer.


Gene ontology Semantic similarity Measure Hierarchical clustering Expression microarray Data analysis 



I thank Tiia Pelkonen for proofreading.


  1. 1.
    Hanahan D, Weinberg RA (2011) Hallmarks of cancer: the next generation. Cell 144:646–674CrossRefPubMedGoogle Scholar
  2. 2.
    Vogelstein B, Papadopoulos N, Velculescu VE et al (2013) Cancer genome landscapes. Science 339:1546–1558CrossRefPubMedPubMedCentralGoogle Scholar
  3. 3.
    Ashburner M, Ball C, Blake J et al (2000) Gene ontology: tool for the unification of biology. Nat Genet 25:25–29CrossRefPubMedPubMedCentralGoogle Scholar
  4. 4.
    Rebhan M, Chalifa-Caspi V, Prilusky J et al (1998) GeneCards: a novel functional genomics compendium with automated data mining and query reformulation support. Bioinformatics 14:656–664CrossRefPubMedGoogle Scholar
  5. 5.
    Guzzi PH, Mina M, Guerra C et al (2012) Semantic similarity analysis of protein data: assessment with biological features and issues. Brief Bioinform 13:569–585CrossRefPubMedGoogle Scholar
  6. 6.
    Resnik P (1995) Using information content to evaluate semantic similarity in a taxonomy. Proceedings of the 14th international joint conference on artificial intelligence, vol 1, pp 448–453Google Scholar
  7. 7.
    Lord P, Stevens R, Brass A et al (2003) Investigating semantic similarity measures across the gene ontology: the relationship between sequence and annotation. Bioinformatics 19:1275–1283CrossRefPubMedGoogle Scholar
  8. 8.
    Mazandu GK, Mulder NJ (2013) Information content-based gene ontology semantic similarity approaches: toward a unified framework theory. BioMed Res In 2013:292063Google Scholar
  9. 9.
    Harispe S, Sánchez D, Ranwez S et al (2014) A framework for unifying ontology-based semantic similarity measures: a study in the biomedical domain. J Biomed Inform 48:38–53CrossRefPubMedGoogle Scholar
  10. 10.
    Ovaska K, Laakso M, Hautaniemi S (2008) Fast gene ontology based clustering for microarray experiments. BioData Mining 1:11CrossRefPubMedPubMedCentralGoogle Scholar
  11. 11.
    The Cancer Genome Atlas Network (2012) Comprehensive molecular portraits of human breast tumours. Nature 490:61–70CrossRefPubMedCentralGoogle Scholar
  12. 12.
    Gentleman RC, Carey VJ, Bates DM et al (2004) Bioconductor: open software development for computational biology and bioinformatics. Genome Biol 5:R80CrossRefPubMedPubMedCentralGoogle Scholar
  13. 13.
    Lin D (1998) An information-theoretic definition of similarity. Proceedings of the 15th international conference on machine learning, pp 296–304Google Scholar
  14. 14.
    Jiang J, Conrath D (1997) Semantic similarity based on corpus statistics and lexical taxonomy. Proceedings of international conference on research in computational linguistics, pp 19–33Google Scholar
  15. 15.
    Schlicker A, Domingues F, Rahnenführer J et al (2006) A new measure for functional similarity of gene products based on gene ontology. BMC Bioinformatics 7:302CrossRefPubMedPubMedCentralGoogle Scholar
  16. 16.
    Huang D, Sherman B, Tan Q et al (2007) The DAVID gene functional classification tool: a novel biological module-centric algorithm to functionally analyze large gene lists. Genome Biol 8:R183CrossRefPubMedPubMedCentralGoogle Scholar
  17. 17.
    Bodenreider O, Aubry M, Burgun A (2005) Non-lexical approaches to identifying associative relations in the gene ontology. Pac Symp Biocomput 2005:91–102Google Scholar
  18. 18.
    Pesquita C, Faria D, Bastos H et al (2008) Metrics for GO based protein semantic similarity: a systematic evaluation. BMC Bioinformatics 9:S4CrossRefPubMedPubMedCentralGoogle Scholar
  19. 19.
    Brun C, Chevenet F, Martin D et al (2004) Functional classification of proteins for the prediction of cellular function from a protein-protein interaction network. Genome Biol 5:6CrossRefGoogle Scholar
  20. 20.
    Couto FM, Silva MJ, Coutinho PM (2007) Measuring semantic similarity between gene ontology terms. Data Knowl Eng 61:137–152CrossRefGoogle Scholar
  21. 21.
    Yu G, Li F, Qin Y et al (2010) GOSemSim: an R package for measuring semantic similarity among GO terms and gene products. Bioinformatics 26:976–978CrossRefPubMedGoogle Scholar
  22. 22.
    Frohlich H, Speer N, Poustka A et al (2007) GOSim – an R-package for computation of information theoretic GO similarities between terms and gene products. BMC Bioinformatics 8:166CrossRefPubMedPubMedCentralGoogle Scholar
  23. 23.
    Harispe S, Ranwez S, Janaqi S et al (2014) The semantic measures library and toolkit: fast computation of semantic similarity and relatedness using biomedical ontologies. Bioinformatics 30:740–742CrossRefPubMedGoogle Scholar
  24. 24.
    Ovaska K, Laakso M, Haapa-Paananen S et al (2010) Large-scale data integration framework provides a comprehensive view on glioblastoma multiforme. Genome Med 2:65CrossRefPubMedPubMedCentralGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2015

Authors and Affiliations

  1. 1.University of HelsinkiHelsinkiFinland

Personalised recommendations