An experimental study of information content measurement of gene ontology terms

  • Marianna MilanoEmail author
  • Giuseppe Agapito
  • Pietro H. GuzziEmail author
  • Mario Cannataro
Original Article


The gene ontology (GO) is commonly used to store and organize information about functions of biological molecules through a controlled vocabulary of terms (GO Terms). GO Terms refer to biological concepts through the annotation process. There exist many different annotation processes used by researchers. Each term has a different specificity that is formally measured by the information content (IC). Both the structure of GO and the corpora of annotations are continuously changing following novel experimental findings. This work focuses on how changes of annotations affect the IC of terms. The study confirms that statistically significant differences among annotation corpus of different years on each species occur. These results convey that annotation corpora changes have a high impact on IC.


Information content Gene ontology Semantic similarity 



This work has been partially founded by project PON Smartcities DICET-INMOTO-ORCHESTRA PON04a2 D funded by MIUR.


  1. 1.
    Gene Ontology Consortium (2004) The gene ontology (GO) database and informatics resource. Nucl Acids Res 32(suppl 1):D258–D261CrossRefGoogle Scholar
  2. 2.
    du Plessis L, Skunca N, Dessimoz C (2011) The what, where, how and why of gene ontology—a primer for bioinformaticians. Brief Bioinform 12(6):723–735. doi: 10.1093/bib/bbr002 CrossRefGoogle Scholar
  3. 3.
    Camon E, Magrane M, Barrell D, Lee V, Dimmer E, Maslen J, Binns D, Harte N, Lopez R, Apweiler R (2004) The gene ontology annotation (goa) database: sharing knowledge in uniprot with gene ontology. Nucl Acids Res 32(1):D262–D266. doi: 10.1093/nar/gkh021 CrossRefGoogle Scholar
  4. 4.
    Guzzi P, Mina M, Guerra C, Cannataro M (2012) Semantic similarity analysis of protein data: assessment with biological features and issues. Brief Bioinform 13(5):569–585CrossRefGoogle Scholar
  5. 5.
    Cannataro M, Guzzi PH, Veltri P (2010) Protein-to-protein interactions. ACM Comput Surv 43(1):1–36CrossRefGoogle Scholar
  6. 6.
    Harispe S, Sánchez D, Ranwez S, Janaqi S, Montmain J (2013) A frame-work for unifying ontology-based semantic similarity measures: a study in the biomedical domain. J Biomed Inform 48:38–53CrossRefGoogle Scholar
  7. 7.
    Hartung M, Kirsten T, Rahm E (2008) Analyzing the evolution of life science ontologies and mappings. In: Data Integration in the Life Sciences. Springer, Heidelberg, pp 11–27Google Scholar
  8. 8.
    Dameron O, Bettembourg C, Le Meur N (2013) Measuring the evolution of ontology complexity: the gene ontology case study. PLoS One 8(10):e75993CrossRefGoogle Scholar
  9. 9.
    Batet M, Harispe S, Ranwez S, Sánchez D, Ranwez V (2014) An information theoretic approach to improve semantic similarity assessments across multiple ontologies. Inf Sci 283:197–210CrossRefGoogle Scholar
  10. 10.
    Sánchez D, Batet M, Isern D (2011) Ontology-based information content computation. Knowl Based Syst 24(2):297–303CrossRefGoogle Scholar
  11. 11.
    Alterovitz G, Xiang M, Hill DP, Lomax J, Liu J, Cherkassky M, Dreyfuss J, Mungall C, Harris MA, Dolan ME et al (2010) Ontology engineering. Nat Biotechnol 28(2):128–130CrossRefGoogle Scholar
  12. 12.
    Resnik P (1995) Using information content to evaluate semantic similarity in a taxonomy. In: IJCAI, 1995, pp 448–453. [Online].
  13. 13.
    Groß A, Hartung M, Prüfer K, Kelso J, Rahm E (2012) Impact of ontology evolution on functional analyses. Bioinformatics 28(20):2671–2677CrossRefGoogle Scholar
  14. 14.
    Huntley R, Sawford T, Martin M, O’Donovan C (2014) Understanding how and why the gene ontology and its annotations evolve: the go within uniprot. GigaScience 3(1):4CrossRefGoogle Scholar
  15. 15.
    Harispe S, Ranwez S, Janaqi S, Montmain J (2013) The semantic measures library and toolkit: fast computation of semantic similarity and relatedness using biomedical ontologies. Bioinformatics 30:740–742CrossRefGoogle Scholar
  16. 16.
    Ernst J, Bar-Joseph Z (2006) Stem: a tool for the analysis of short time series gene expression data. BMC Bioinform 7(1):191CrossRefGoogle Scholar
  17. 17.
    Herrero J, Valencia A, Dopazo J (2001) A hierarchical unsupervised growing neural network for clustering gene expression patterns. Bioinformatics 17(2):126–136CrossRefGoogle Scholar
  18. 18.
    Groß A, Hartung M, Prüfer K, Kelso J, Rahm E (2012) Impact of ontology evolution on functional analyses. Bioinformatics 28:2671–2677CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2016

Authors and Affiliations

  1. 1.Department of Surgical and Medical SciencesUniversity of CatanzaroCatanzaroItaly

Personalised recommendations