Augmenting Concept Definition in Gloss Vector Semantic Relatedness Measure using Wikipedia Articles

  • Ahmad Pesaranghader
  • Ali Pesaranghader
  • Azadeh Rezaei
Conference paper
Part of the Lecture Notes in Electrical Engineering book series (LNEE, volume 285)


Semantic relatedness measures are widely used in text mining and information retrieval applications. Considering these automated measures, in this research paper we attempt to improve Gloss Vector relatedness measure for more accurate estimation of relatedness between two given concepts. Generally, this measure, by constructing concepts definitions (Glosses) from a thesaurus, tries to find the angle between the concepts’ gloss vectors for the calculation of relatedness. Nonetheless, this definition construction task is challenging as thesauruses do not provide full coverage of expressive definitions for the particularly specialized concepts. By employing Wikipedia articles and other external resources, we aim at augmenting these concepts’ definitions. Applying both definition types to the biomedical domain, using MEDLINE as corpus, UMLS as the default thesaurus, and a reference standard of 68 concept pairs manually rated for relatedness, we show exploiting available resources on the Web would have positive impact on final measurement of semantic relatedness.


Semantic relatedness Biomedical text mining Web mining Bioinformatics UMLS MEDLINE Wikipedia Natural language processing 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Muthaiyah, S., Kerschberg, L.: A Hybrid Ontology Mediation Approach for the Semantic Web. International Journal of E-Business Research. 4, 79–91 (2008)Google Scholar
  2. 2.
    Pekar, V., Ou, S., Constantin Orasan, C., Spurk, C., Negri, M.: Development and alignment of a domain-specific ontology for question answering,” In: Proceedings of the 6th Edition of the Language Resources and Evaluation Conference (LREC-08), May. (2008)Google Scholar
  3. 3.
    Chen, B., Foster, G., Kuhn, R.: Bilingual Sense Similarity for Statistical Machine Translation. In: Proceedings of the ACL, pp. 834–843 (2010)Google Scholar
  4. 4.
    Bousquet, C., Lagier, G., LilloLe, L.A., Le Beller, C., Venot, A., Jaulent, M.C.: Appraisal of the MedDRA Conceputal Structure for describing and grouping adverse drug reactions. Drug Safety, vol. 28, no. 1, pp. 19–34 (2005)Google Scholar
  5. 5.
    Firth, J.R.: A Synopsis of Linguistic Theory 1930-1955. In Studies in Linguistic Analysis, pp. 1–32 (1957)Google Scholar
  6. 6.
    Rada, R., Mili, H., Bicknell, E., Blettner, M.: Development and Application of a Metric on Semantic Nets. IEEE Transactions on Systems, Man and Cybernetics. 19, 17–30 (1989)Google Scholar
  7. 7.
    Wu, Z., Palmer, M.: Verb Semantics and Lexical Selections. In: Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics, (1994)Google Scholar
  8. 8.
    Resnik, P.: Using Information Content to Evaluate Semantic Similarity in a Taxonomy. In: Proceedings of the 14th International Joint Conference on Artificial Intelligence, pp. 448–453 (1995)Google Scholar
  9. 9.
    Jiang, J.J., Conrath, D.W.: Semantic Similarity based on Corpus Statistics and Lexical Taxonomy. In: International Conference on Research in Computational Linguistics. (1997)Google Scholar
  10. 10.
    Lin, D.: An Information-theoretic Definition of Similarity. In: 15th International Conference on Machine Learning. Madison, USA, (1998)Google Scholar
  11. 11.
    Pesaranghader, A., Muthaiyah, S.: Definition-based information content vectors for semantic similarity measurement. In: Proceedings of the 2nd International Multi-Conference on Artificial Intelligence Technology (M-CAIT), pp 268–282 (2013)Google Scholar
  12. 12.
    Lesk, M.: Automatic Sense Disambiguation Using Machine Readable Dictionaries: How to Tell a Pine Cone from an Ice-cream Cone. In: Proceedings of the 5th Annual International Conference on Systems Documentation, pp. 24–26. New York, USA (1986)Google Scholar
  13. 13.
    Banerjee, S., Pedersen, T.: An Adapted Lesk Algorithm for Word Sense Disambiguation using WordNet. In: Proceedings of the 3rd International Conference on Intelligent Text Processing and Computational Linguistics. Mexico City (2002)Google Scholar
  14. 14.
    Patwardhan, S., Pedersen, T: Using WordNet-based Context Vectors to Estimate the Semantic Relatedness of Concepts. In: Proceedings of the EACL 2006 Workshop, Making Sense of Sense: Bringing Computational Linguistics and Psycholinguistics together. pp. 1–8. Trento, Italy (2006)Google Scholar
  15. 15.
    Liu, Y., T. McInnes, B.T., Pedersen, T., Melton-Meaux, G., Pakhomov. S.: Semantic relatedness study using second order co-occurrence vectors computed from biomedical corpora, UMLS and WordNet. In: Proceedings of the 2nd ACM SIGHIT IHI, pp. 363–371 (2012)Google Scholar
  16. 16.
    Pakhomov, S., McInnes, B., Adam, T., Liu, Y., Pedersen, T., Melton, G.: Semantic Similarity and Relatedness between Clinical Terms: An Experimental Study. In: Proceedings of AMIA, pp. 572–576 (2010)Google Scholar

Copyright information

© Springer Science+Business Media Singapore 2014

Authors and Affiliations

  • Ahmad Pesaranghader
    • 1
  • Ali Pesaranghader
    • 2
  • Azadeh Rezaei
    • 1
  1. 1.Jalan MultimediaMultimedia University (MMU)CyberjayaMalaysia
  2. 2.Universiti Putra Malaysia (UPM)SerdangMalaysia

Personalised recommendations