Gene Ontology Semantic Similarity Analysis Using GOSemSim

  • Guangchuang YuEmail author
Part of the Methods in Molecular Biology book series (MIMB, volume 2117)


The GOSemSim package, an R-based tool within the Bioconductor project, offers several methods based on information content and graph structure for measuring semantic similarity among GO terms, gene products and gene clusters. In this chapter, I illustrate the use of GOSemSim on a list of regulators in preimplantation embryos. A step-by-step analysis was provided as well as instructions on interpretation and visualization of the results. GOSemSim is open-source and is available from

Key words

Semantic similarity GOSemSim Gene ontology Functional prediction Reproducible research 



I thank Drs. Yin Ge and Zhongtian Xu for providing useful feedback and helpful comments on the manuscript. This work was supported by Startup funds from Southern Medical University (G618289088).


  1. 1.
    Han Y, Yu G, Sarioglu H et al (2013) Proteomic investigation of the interactome of FMNL1 in hematopoietic cells unveils a role in calcium-dependent membrane plasticity. J Proteome 78:72–82. Scholar
  2. 2.
    Yu G, He Q-Y (2011) Functional similarity analysis of human virus-encoded miRNAs. J Clin Bioinforma 1:15. Scholar
  3. 3.
    Pirkl M, Diekmann M, van der Wees M et al (2017) Inferring modulators of genetic interactions with epistatic nested effects models. PLoS Comput Biol 13:e1005496. Scholar
  4. 4.
    Lei C, Ruan J (2013) A novel link prediction algorithm for reconstructing protein–protein interaction networks by topological similarity. Bioinformatics 29:355–364. Scholar
  5. 5.
    Bhattacharya A, Cui Y (2015) miR2GO: comparative functional analysis for microRNAs. Bioinformatics 31:2403–2405. Scholar
  6. 6.
    Zhou H, Yang Y, Shen H-B (2017) Hum-mPLoc 3.0: prediction enhancement of human protein subcellular localization through modeling the hidden correlations of gene ontology and functional domain features. Bioinformatics 33:843–853. Scholar
  7. 7.
    Yu G, Li F, Qin Y et al (2010) GOSemSim: an R package for measuring semantic similarity among GO terms and gene products. Bioinformatics 26:976–978. Scholar
  8. 8.
    Resnik P (1999) Semantic similarity in a taxonomy: an information-based measure and its application to problems of ambiguity in natural language. J Artif Intell Res 11:95–130. Scholar
  9. 9.
    Jiang JJ, Conrath DW (1997) Semantic similarity based on corpus statistics and lexical taxonomy. ArXivcmp-Lg9709008Google Scholar
  10. 10.
    Lin D (1998) An information-theoretic definition of similarity. In: Proceedings of the fifteenth international conference on machine learning. Morgan Kaufmann Publishers Inc., San Francisco, pp 296–304Google Scholar
  11. 11.
    Schlicker A, Domingues FS, Rahnenführer J, Lengauer T (2006) A new measure for functional similarity of gene products based on Gene Ontology. BMC Bioinformatics 7:302. Scholar
  12. 12.
    Wang JZ, Du Z, Payattakool R et al (2007) A new method to measure the semantic similarity of GO terms. Bioinforma Oxf Engl 23:1274–1281. Scholar
  13. 13.
    Eddelbuettel D, Francois R (2011) Rcpp: Seamless R and C++ Integration. J Stat Softw 40:1–18. Scholar
  14. 14.
    Yu G, Wang L-G, Han Y, He Q-Y (2012) clusterProfiler: an R package for comparing biological themes among gene clusters. OMICS J Integr Biol 16:284–287. Scholar
  15. 15.
    Yu G, Wang L-G, Yan G-R, He Q-Y (2015) DOSE: an R/Bioconductor package for disease ontology semantic and enrichment analysis. Bioinformatics 31:608–609. Scholar
  16. 16.
    Paradis E, Schliep K (2019) ape 5.0: an environment for modern phylogenetics and evolutionary analyses in R. Bioinformatics 35:526–528. Scholar
  17. 17.
    Yu G, Smith DK, Zhu H et al (2017) Ggtree: an r package for visualization and annotation of phylogenetic trees with their covariates and other associated data. Methods Ecol Evol 8:28–36. Scholar
  18. 18.
    Wu J, Huang B, Chen H et al (2016) The landscape of accessible chromatin in mammalian preimplantation embryos. Nature 534:652–657. Scholar
  19. 19.
    Yu G, Lam TT-Y, Zhu H, Guan Y (2018) Two methods for mapping and visualizing associated data on phylogeny using Ggtree. Mol Biol Evol 35:3041–3043. Scholar
  20. 20.
    Yu G (2018) Using meshes for MeSH term enrichment and semantic analyses. Bioinformatics 34:3766–3767. Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2020

Authors and Affiliations

  1. 1.Department of Bioinformatics, School of Basic Medical SciencesSouthern Medical UniversityGuangzhouChina

Personalised recommendations