Computing Semantic Similarity for Vietnamese Concepts Using Wikipedia

  • Hien T. NguyenEmail author
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 341)


Evaluating semantic similarity between concepts is a very common component in many applications dealing with textual data such as information extraction, information retrieval, natural language processing, or knowledge acquisition. This paper presents an approach to assess semantic similarity between Vietnamese concepts using Vietnamese Wikipedia. Firstly, the Vietnamese Wikipedia’ structure is exploited to derive a Vietnamese ontology. Next, based on the obtained ontology, we employ similarity measures in literature to evaluate the semantic similarity between Vietnamese concepts. Then we conduct an experiment providing 30 Vietnamese concept pairs to 18 human subjects to assess similarity of these pairs. Finally, we use Pearson product-moment correlation coefficient to estimate the correlation between human judgments and the results of similarity measures employed. The experiment results show that our system achieves quite good performance and that similarity measures between Vietnamese concepts are potential in enhancing the performance of applications dealing with textual data.


Similarity Measure Semantic Similarity Semantic Relatedness Short Path Length Concept Pair 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Vafaee, F., Rosu, D., Broackes-Carter, F., Jurisica, I.: Novel semantic similarity measure improves an integrative approach to predicting gene functional associations. BMC Syst. Biol. 7, 22 (2013)CrossRefGoogle Scholar
  2. 2.
    Wenyin, L., Quan, X., Feng, M., Qiu, B.: A short text modeling method combining semantic and statistical information. Inf. Sci. 180(20), 4031–4041 (2010)CrossRefGoogle Scholar
  3. 3.
    Oliva, J., Serrano, J.I., del Castillo, M.D., Iglesias, Á.: SyMSS: a syntax-based measure for short-text semantic similarity. Data Knowl. Eng. 70(4), 390–405 (2011)CrossRefGoogle Scholar
  4. 4.
    Sánchez, D., Isern, D., Millán, M.: Content annotation for the semantic web: an automatic web-based approach. Knowl. Inf. Syst. 27, 393–418 (2011)CrossRefGoogle Scholar
  5. 5.
    Bontcheva, K. and Rout, D.: Making sense of social media streams through semantics: a survey. In: Semantic Web Journal. IOS Press (2012)Google Scholar
  6. 6.
    Hliaoutakis, A., Varelas, G., Voutsakis, E., Petrakis, E.G., Milios, E.: Information retrieval by semantic similarity. Int. J. semant. Web Inf. Syst. (IJSWIS) 2(3), 55–73 (2006)CrossRefGoogle Scholar
  7. 7.
    Jiang, Y., Wang, X., Zheng, H.T.: A semantic similarity measure based on information distance for ontology alignment. Info. Sci. 278, 76 (2014).
  8. 8.
    Sánchez, D., Moreno, A., Vasto, L.D.: Learning relation axioms from text: an automatic web-based approach. Expert Syst. Appl. 39, 5792–5805 (2012)CrossRefGoogle Scholar
  9. 9.
    Ferreira, J.D., Couto, F.M.: Semantic similarity for automatic classification of chemical compounds. PLoS Comput. Biol. 6(9), e1000937 (2010)CrossRefGoogle Scholar
  10. 10.
    Batet, M.: Ontology-based semantic clustering. AI Commun. 24, 291–292 (2011)Google Scholar
  11. 11.
    Schulz, M., Krause, F., Le Novere, N., Klipp, E., Liebermeister, W.: Re-trieval, alignment, and clustering of computational models based on semantic annotations. Mol. Syst. Biol. 7(512), 1–10 (2011)Google Scholar
  12. 12.
    Luo, Q., Chen, E., Xiong, H.: A semantic term weighting scheme for text categorization. Expert Syst. Appl. 38, 12708–12716 (2011)CrossRefGoogle Scholar
  13. 13.
    Cilibrasi, R., Vitanyi, P.M.B.: The Google similarity distance. IEEE Trans. Knowl. Data Eng. 19(3), 370–383 (2006)CrossRefGoogle Scholar
  14. 14.
    Fernando, S., and Stevenson, M.: A semantic similarity approach to para-phrase detection. In: Proceedings of the 11th Annual Research Colloquium of the UK Special Interest Group for Computational Linguistics, pp. 45–52 (2008)Google Scholar
  15. 15.
    Resnik, P.: Semantic similarity in a taxonomy: an information-based measure and its application to problems of ambiguity in natural language. J. Artif. Intell. Res. 11, 95–130 (1999)zbMATHGoogle Scholar
  16. 16.
    McInnes, B.T., Pedersen, T.: Evaluating measures of semantic similarity and relatedness to disambiguate terms in biomedical text. J. Biomed. Inform. 46(6), 1116–1124 (2013)CrossRefGoogle Scholar
  17. 17.
    Sánchez, D., Isern, D.: Automatic extraction of acronym definitions from the web. Appl. Intell. 34(2), 311–327 (2011)CrossRefGoogle Scholar
  18. 18.
    Pedersen, T., Pakhomov, S.V., Patwardhan, S., Chute, C.G.: Measures of semantic similarity and relatedness in the biomedical domain. J. Biomed. Inform. 40(3), 288–299 (2007)CrossRefGoogle Scholar
  19. 19.
    Sánchez, D., Batet, M., Isern, D., Valls, A.: Ontology-based semantic similarity: a new feature-based approach. Expert Syst. Appl. 39(9), 7718–7728 (2012)CrossRefGoogle Scholar
  20. 20.
    Zesch, T., Gurevych, I.: Wisdom of crowds versus wisdom of linguists-measuring the semantic relatedness of words. Nat. Lang. Eng. 16(1), 25 (2010)CrossRefGoogle Scholar
  21. 21.
    Gracia, J., Mena, E.: Web-based measure of semantic relatedness. Web Information Systems Engineering-WISE 2008, pp. 136–150. Springer, Berlin (2008)CrossRefGoogle Scholar
  22. 22.
    Hsu, Y.Y., Chen, H.Y., Kao, H.Y.: Using a search engine-based mutually reinforcing approach to assess the semantic relatedness of biomedical terms. PloS One 8(11), e77868 (2013)CrossRefGoogle Scholar
  23. 23.
    Budanitsky, A., Hirst, G.: Evaluating Wordnet-based measures of lexical semantic relatedness. Comput. Linguist. 32(1), 13–47 (2006)CrossRefzbMATHGoogle Scholar
  24. 24.
    Strube, M., Ponzetto, S.P.: WikiRelate! computing semantic relatedness using Wikipedia. AAAI 6, 1419–1424 (2006)Google Scholar
  25. 25.
    Gabrilovich, E., Markovitch, S.: Computing semantic relatedness using Wikipedia-based explicit semantic analysis. In: Proceedings of IJCAI, pp. 1606–1611 (2007)Google Scholar
  26. 26.
    Witten, I., Milne, D.: An effective, low-cost measure of semantic relatedness obtained from Wikipedia links. In: Proceeding of AAAI Workshop on Wikipedia and Artificial Intelligence: An Evolving Synergy, pp. 25–30 (2008)Google Scholar
  27. 27.
    Hassan, S., Mihalcea, R.: Semantic Relatedness Using Salient Semantic Analysis. In: Proceedings of AAAI (2011)Google Scholar
  28. 28.
    Singer, P., Niebler, T., Strohmaier, M., Hotho, A.: Computing semantic relatedness from human navigational paths: a case study on Wikipedia. Int. J. Semant. Web Inf. Syst. (IJSWIS) 9(4), 41–70 (2013)CrossRefGoogle Scholar
  29. 29.
    Liu, Y., McInnes, B.T., Pedersen, T., Melton-Meaux, G., Pakhomov, S.: Semantic relatedness study using second order co-occurrence vectors computed from biomedical corpora, UMLS and WordNet. In: Proceedings of the 2nd ACM SIGHIT International Health Informatics Symposium, pp. 363–372 (2012)Google Scholar
  30. 30.
    Bollegala, D., Matsuo, Y., Ishizuka, M.: A web search engine-based approach to measure semantic similarity between words. IEEE Trans. Knowl. Data Eng. 23(7), 977–990 (2010)CrossRefGoogle Scholar
  31. 31.
    Ballatore, A., Wilson, D.C., Bertolotto, M.: Computing the semantic similarity of geographic terms using volunteered lexical definitions. Int. J. Geogr. Inf. Sci. 27(10), 2099–2118 (2013)CrossRefGoogle Scholar
  32. 32.
    Sánchez, D., Batet, M., Valls, A., Gibert, K.: Ontology-driven web-based semantic similarity. J. Intell. Inf. Syst. 35(3), 383–413 (2010)CrossRefGoogle Scholar
  33. 33.
    Curran, J.R.: From distributional to semantic similarity. Ph.D. thesis, University of Edinburgh (2004)Google Scholar
  34. 34.
    Harris, Z.S.: Distributional structure. Word 10, 146–162 (1954)Google Scholar
  35. 35.
    Sánchez, D., Batet, M., Isern, D.: Ontology-based information content computation. Knowl.-Based Syst. 24(2), 297–303 (2011)CrossRefGoogle Scholar
  36. 36.
    Lin, D.: An information-theoretic definition of similarity. In: Proceedings of Conference on Machine Learning, pp. 296–304 (1998)Google Scholar
  37. 37.
    Jiang, J.J., Conrath, D.W.: Semantic similarity based on corpus statistics and lexical taxonomy. In: Proceedings of International Conference Research on Computational Linguistics (ROCLING X), pp. 19–33 (1997)Google Scholar
  38. 38.
    Leacock, C., Chodorow, M.: Combining local context and WordNet similarity for word sense identification. In: Fellbaum, C. (ed.) WordNet: An Electronic Lexical Database. pp. 265–283. MIT Press, Cambridge (1998)Google Scholar
  39. 39.
    Rada, R., Mili, H., Bichnell, E., Blettner, M.: Development and application of a metric on semantic nets. IEEE Trans. Syst. Man Cybern. 9(1), 17–30 (1989)CrossRefGoogle Scholar
  40. 40.
    Wu, Z., Palmer, M.: Verb semantics and lexical selection. In: Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics, pp. 133–138, (1994)Google Scholar
  41. 41.
    Sánchez, D., Batet, M.: A semantic similarity method based on information content exploiting multiple ontologies. Expert Syst. Appl. 40(4), 1393–1399 (2013)CrossRefGoogle Scholar
  42. 42.
    Resnik, P.: Information content to evaluate semantic similarity in a taxonomy. In: Proceedings of IJCAI, pp. 448–453 (1995)Google Scholar
  43. 43.
    Seco, N., Veale, T., Hayes, J.: An intrinsic information content metric for semantic similarity in WordNet. In: Proceedings of ECAI, pp. 1089–1090 (2004)Google Scholar
  44. 44.
    Wu, X., Pang, E., Lin, K., Pei, Z.-M.: Improving the measurement of semantic similarity between gene ontology terms and gene products: insights from an edge- and IC-based hybrid method. PLoS ONE 8(5), e66745 (2013). doi: 10.1371/journal.pone.0066745 CrossRefGoogle Scholar
  45. 45.
    Tversky, A.: Features of similarity. Psychol. Rev. 84(2), 327–352 (1977)CrossRefGoogle Scholar
  46. 46.
    Zuber, V.S., Faltings, B.: OSS: A semantic similarity function based on hierarchical ontologies. In: Proceedings of IJCAI, pp. 551–556 (2007)Google Scholar
  47. 47.
    Pirró, G.: A semantic similarity metric combining features and intrinsic information content. Data Knowl. Eng. 68(11), 1289–1308 (2009)CrossRefGoogle Scholar
  48. 48.
    Solé-Ribalta, A., Sénchez, D., Batet, M., Serratosa, F.: Towards the estimation of feature-based semantic similarity using multiple ontologies. Knowl.-Based Syst. 55, 101–113 (2014)CrossRefGoogle Scholar
  49. 49.
    Li, Y., Bandar, Z.A., McLean, D.: An approach for measuring semantic similarity between words using multiple information sources. IEEE Trans. Knowl. Data Eng. 15(4), 871–882 (2003)CrossRefGoogle Scholar
  50. 50.
    Al-Mubaid, H., Nguyen, A.: Measuring semantic similarity between bio-medical concepts within multiple ontologies. IEEE Trans. Syst. Man Cybern. Part C: Appl. Rev. 39, 389–398 (2009)CrossRefGoogle Scholar
  51. 51.
    Rodríguez, A., Egenhofer, M.J.: Determining semantic similarity among entity classes from different ontologies. IEEE Trans. Knowl. Data Eng. 15, 442–456 (2003)CrossRefGoogle Scholar
  52. 52.
    Batet, M., Sánchez, D., Valls, A.: An ontology-based measure to compute semantic similarity in biomedicine. J. Biomed. Inform. 44(1), 118–125 (2011)CrossRefGoogle Scholar
  53. 53.
    Batet, M., Sánchez, D., Valls, A., Gibert, K.: Semantic similarity estimation from multiple ontologies. Appl. Intell. 38(1), 29–44 (2013)CrossRefGoogle Scholar
  54. 54.
    Ponzetto, S.P., Strube, M.: Deriving a large scale taxonomy from Wikipedia. AAAI 7, 1440–1445 (2007)Google Scholar
  55. 55.
    Miller, G., Charles, W.: Contextual correlates of semantic similarity. Lang. Cogn. Process. 6(1), 1–28 (1991)CrossRefGoogle Scholar
  56. 56.
    Rubenstein, H., Goodenough, J.B.: Contextual correlates of synonymy. Commun. ACM 8(10), 627–633 (1965)CrossRefGoogle Scholar
  57. 57.
    Harispe, S., Sánchez, D., Ranwez, S., Janaqi, S., Montmain, J.: A frame-work for unifying ontology-based semantic similarity measures: a study in the biomedical domain. J. Biomed. Inform. 48, 38–53 (2013)CrossRefGoogle Scholar
  58. 58.
    Sy, M.F., Ranwez, S., Montmain, J., Regnault, A., Crampes, M., Ranwez, V.: User centered and ontology based information retrieval system for life sciences. BMC Bioinform. 13(Suppl 1), S4 (2012)CrossRefGoogle Scholar
  59. 59.
    Yazdani, M., Popescu-Belis, A.: Computing text semantic relatedness using the contents and links of a hypertext encyclopedia. Artif. Intell. 194, 176–202 (2013)CrossRefzbMATHMathSciNetGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.Faculty of Information TechnologyTon Duc Thang UniversityHo Chi Minh CityVietnam

Personalised recommendations