Skip to main content
Log in

An efficient approach for measuring semantic relatedness using Wikipedia bidirectional links

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

The measurement of the semantic relatedness between concepts is an important fundamental research topic in natural language processing. The link-based model is the most promising relatedness method in Wikipedia-based measures because its manually defined links in Wikipedia are refined and close to the semantics of humans. This paper proposes a Wikipedia two-way link model to extend the existing Wikipedia one-way out-link model, which has a low dimension and a high efficiency, as well as being easy to implement and repeat. First, this model utilizes the out-links and in-links of concepts in Wikipedia to combine into a bidirectional link vector for concept semantic interpreter and uses a TF*IDF-based bidirectional weight method to uniformly calculate the strength of the mutual association between a given concept and its out-link or in-link concept. Second, we propose a disambiguation strategy based on the social awareness of senses that directly sorts the out-links within a disambiguation page in the order in which they occur in the disambiguation page and adopts an adjustable threshold to determine how many senses will be selected. Moreover, we also propose new vector similarity metrics based on logarithm and exponent to improve the comprehensive performance of the semantic relatedness measurements based on Wikipedia links. The experimental results on some well-recognized datasets demonstrate that our model surpasses the existing popular Naïve Explicit Semantic Analysis (Naïve-ESA) and Wikipedia Out-Link vector-based Measure (WOLM) methods in the current Wikipedia versions and that our bidirectional link model significantly improves the performance of the existing one-way link model in practical applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Notes

  1. https://dkpro.github.io/dkpro-jwpl/

  2. https://www.mturk.com/

  3. http://wacky.sslmit.unibo.it/doku.php?id=corpora

References

  1. Agirre E, Soroa A (2009) Personalizing PageRank for Word Sense Disambiguation. In: Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, Athens, pp 33–41

    Google Scholar 

  2. O D, Kwon S, Kim K, Ko Y (2018) Word sense disambiguation based on word similarity calculation using word vector representation from a knowledge-based graph. In: Proceedings of COLING 2018, pp 2704–2714

    Google Scholar 

  3. Asiaee AH, Minning T, Doshi P, Tarleton RL (2015) A framework for ontology-based question answering with application to parasite immunology. Journal of Biomedical Semantics 6:31

    Article  Google Scholar 

  4. Zhu X, Yang X, Chen H (2018) A biomedical question answering system based on SNOMED-CT. In: Proceedings of the 11th international conference on knowledge science. Engineering and management, pp 16–28

    Google Scholar 

  5. Atkinson J, Ferreira A, Aravena E (2009) Discovering implicit intention-level knowledge from natural-language texts. Knowl-Based Syst 22:502–508

    Article  Google Scholar 

  6. Ru C, Tang J, Li S, Xie S, Wang T (2018) Using semantic similarity to reduce wrong labels in distant supervision for relation extraction. Inf Process Manag 54:593–608

    Article  Google Scholar 

  7. Yeh JF (2016) Speech act identification using semantic dependency graphs with probabilistic context-free grammars. ACM T Asian Low-Reso 15(1):5

    Google Scholar 

  8. Srihari RK, Zhang Z, Rao A (2000) Intelligent indexing and semantic retrieval of multimodal documents. Inf Retr 2:245–275

    Article  Google Scholar 

  9. Liu Q, Liu B, Zhang Y, Kim DS, Gao Z (2016) Improving opinion aspect extraction using semantic similarity and aspect associations. In: Proceedings of AAAI 2016, pp 2986–2992

    Google Scholar 

  10. Zhu GG, Iglesias CA (2018) Exploiting semantic similarity for named entity disambiguation in knowledge graphs. Expert Syst Appl 101:8–24

    Article  Google Scholar 

  11. Pech F, Martinez A, Estrada H, Hernandez Y (2017) Semantic annotation of unstructured documents using concepts similarity. Sci Program 2017:1–10

    Google Scholar 

  12. Miller GA, Charles WG (1991) Contextual correlates of semantic similarity. Lang Cogn Process 6:1–28

    Article  Google Scholar 

  13. Hadj Taieb MA, Aouicha MB, Hamadou AB (2014) A new semantic relatedness measurement using WordNet features. Knowl Inf Syst 41:467–497

    Article  Google Scholar 

  14. Resnik P (1995) Using information content to evaluate semantic similarity in a taxonomy. In: Proceedings of 14th international joint conference on artificial intelligence, IJCAI, vol 1995. Morgan Kaufmann Publishers Inc., Montreal, Quebec, pp 448–453

  15. Petrakis EGM, Varelas G (2006) A. Hliaoutakis, P. Raftopoulou, X-similarity: computing semantic similarity between concepts from different ontologies. J Digit Inf Manag 4:233–237

    Google Scholar 

  16. Lin D (1998) An information-theoretic definition of similarity. In: Proceedings of fifteenth international conference on machine learning, ICML, vol 1998. Morgan Kaufmann, Madison, pp 296–304

  17. Jiang JJ, Conrath DW (1997) Semantic similarity based on corpus statistics and lexical taxonomy. In: Proceedings of International Conference on Research in Computational Linguistics. ROCLING X, Taipei, pp 19–33

    Google Scholar 

  18. Seco N, Veale T, Hayes J (2004) An intrinsic information content metric for semantic similarity in WordNet. In: Proceedings ECAI 4 (PhD thesis), pp 1089–1090

    Google Scholar 

  19. Patwardhan S Pedersen T (2006) using WordNet-based context vectors to estimate the semantic relatedness of concepts. In: Proceedings of EACL 2006 workshop on making sense of sense: bringing computational linguistics and psycholinguistics together, Trento, pp 1–8

  20. Hadj Taieb MA, Aouicha MB, Hamadou AB (2014) Ontology-based approach for measuring semantic similarity. Eng Appl Artif Intell 36:238–261

    Article  Google Scholar 

  21. Wu Z, Palmer M (1994) Verbs semantics and lexical selection. In: Proceedings of the 32nd annual meeting on Association for Computational Linguistics, pp 133–138

    Chapter  Google Scholar 

  22. Fellbaum C (1998) WordNet: an electronic lexical database (language, speech, and communication), illustrated edition. The MIT Press

  23. Strube M, Ponzetto SP (2006) WikiRelate! computing semantic relatedness using Wikipedia. In: National Conference on Artificial Intelligence AAAI Press, pp 1419–1424

    Google Scholar 

  24. Aouicha MB, Hadj Taieb MA, Hamadou AB (2016) Taxonomy-based information content and wordnet-wiktionary-wikipedia glosses for semantic relatedness. Appl Intell 45:475–511

    Article  Google Scholar 

  25. Jiang Y, Bai W, Zhang X, Hu J (2017) Wikipedia-based information content and semantic similarity computation. Inf Process Manag 53:248–265

    Article  Google Scholar 

  26. Hadj Taieb MA, Ben Aouicha M, Ben Hamadou A (2013) Computing semantic relatedness using Wikipedia features. Knowl-Based Syst 50(50):260–278

    Article  Google Scholar 

  27. Gabrilovich E, Markovitch S (2007) Computing semantic relatedness using Wikipedia-based explicit semantic analysis. In: Proceedings of international joint conference on Artifical intelligence Morgan. Kaufmann Publishers Inc, San Francisco, pp 1606–1611

    Google Scholar 

  28. Yeh E, Ramage D, Manning CD, Agirre E, Soroa A (2009) Wikiwalk: random walks on Wikipedia for semantic relatedness. In: Proceedings of the 2009 Workshop onGraph-based Methods for Natural Language Processing, TextGraphs-4, Association for Computational Linguistics, Stroudsburg, pp 41–49

  29. Radinsky K, Agichtein E, Gabrilovich E, Markovitch S (2011) A word at a time: computing word relatedness using temporal semantic analysis. In: Proceedings of the 20th international conference on world wide web. WWW ‘11. ACM, New York, pp 337–346

    Chapter  Google Scholar 

  30. Milne D, Witten IH (2008) An effective, low-cost measure of semantic relatedness obtained from Wikipedia links. In: Proceeding of AAAI workshop on Wikipedia and artificial intelligence: an evolving synergy. AAAI press, Chicago, pp 25–30

    Google Scholar 

  31. Milne D (2007) Computing semantic relatedness using Wikipedia link structure. In: Proceeding of New Zealand Computer Science Research Student Conference (NZCSRSC’07)

    Google Scholar 

  32. Aouicha MB, Hadj Taieb MA, Hamadou AB (2016) LWCR: multi-layered Wikipedia representation for computing word relatedness. Neurocomputing 216:816–843

    Article  Google Scholar 

  33. Qu R, Fang Y, Bai W, Jiang Y (2018) Computing semantic similarity based on novel models of semantic representation using wikipedia. Inf Process Manag 54:1002–1021

    Article  Google Scholar 

  34. Jiang Y, Zhang X, Tang Y, Nie R (2015) Feature-based approaches to semantic similarity assessment of concepts using Wikipedia. Inf Process Manag 51:215–234

    Article  Google Scholar 

  35. Pilehvar MT, Navigli R (2015) From senses to texts: an all-in-one graph-based approach for measuring semantic similarity. Artif Intell 228:95–128

    Article  MathSciNet  MATH  Google Scholar 

  36. Landauer TK, Foltz PW, Laham D (1998) An introduction to latent semantic analysis. Discl Process 25:259–284

    Article  Google Scholar 

  37. Zobel J, Moffat A (1998) Exploring the similarity space. Acm Sigir Forum 32:18–34

    Article  Google Scholar 

  38. Li P, Xiao B, Ma W, Jiang Y, Zhang Z (2017) A graph-based semantic relatedness assessment method combining wikipedia features. Eng Appl Artif Intell 65:268–281

    Article  Google Scholar 

  39. Jaccard P (1901) Distribution de la flore alpine dans le bassin des dranses et dans quelques régions voisines. Bull Soc Vaudoise Sci Nat 37:241–272

    Google Scholar 

  40. Dice LR (1945) Measures of the amount of ecologic association between species. Ecology 26:297–302

    Article  Google Scholar 

  41. Firth JR (1957) A synopsis of linguistic theory 1930–1955. Special, p 562

  42. Harris Z (1981) Distributional structure, vol 10. Springer Netherlands, Dordrecht, pp 146–162

    Google Scholar 

  43. Rubenstein H, Goodenough JB (1965) Contextual correlates of synonymy. Commun ACM 8:627–633

    Article  Google Scholar 

  44. Finkelstein L, Gabrilovich YM, Rivlin E, Solan Z, Wolfman G, Ruppin E (2002) Placing search in context: the concept revisited. ACM Trans Inf Syst 20:116–131

    Article  Google Scholar 

  45. Bruni E, Tran NK, Baroni M (2014) Multimodal distributional semantics. JArtif Intell Res (JAIR) 49:1–47

    MathSciNet  MATH  Google Scholar 

  46. Szumlanski SR, Gomez F, Sims VK (2013) A new set of norms for semantic relatedness measures. in: ACL (2), pp 890–895

  47. Radinsky K, Agichtein E, Gabrilovich E, Markovitch S (2011) A word at a time: computing word relatedness using temporal semantic analysis. In: proceedings of the 20th international conference on world wide web, pp 337–346

    Chapter  Google Scholar 

  48. Halawi G, Dror G, Gabrilovich E, Koren Y (2012) Large-scale learning of word relatedness with constraints. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining, pp 1406–1414

    Google Scholar 

  49. Lastra-Díaz JJ, García-Serrano A, Batet M, Fernández M, Chirigati F (2017) HESML: a scalable ontology-based semantic similarity measures library with a set of reproducible experiments and a replication dataset. Inf Syst 66:97–118

    Article  Google Scholar 

  50. Spearman C (1987) The proof and measurement of association between two things. Am J Psychol 100:441–471

    Article  Google Scholar 

  51. Hassan S, Banea C, Mihalcea R (2012) Measuring semantic relatedness using multilingual representations. In: Proceedings of the first joint conference on lexical and computational Semantics-1:proceedings of the Main conference and the shared task, and Volume2: proceedings of the sixth international workshop on semantic evaluation, pp 20–29

    Google Scholar 

  52. Zhu X, Li F, Chen H, Peng Q (2018) An efficient path computing model for measuring se-mantic similarity using edge and density. Knowl Inf Syst 55:79–111

    Article  Google Scholar 

  53. Dolan B, Quirk C, Brockett C (2004) Unsupervised construction of large paraphrase corpora: exploiting massively parallel news sources. Proceedings of the Twentieth International Conference on Computational Linguistics (COLING):350–356

  54. Tsatsaronis G, Varlamis I, Vazirgiannis M (2010) Text relatedness based on a word thesaurus. J Artif Intell Res 37:1–39

    Article  MATH  Google Scholar 

Download references

Acknowledgments

This work has been supported by the National Natural Science Foundation of China under the contract numbers 61462010 and 61363036, the Natural Science Foundation of Guangxi of China under the contract number 2018GXNSFAA138087, the Innovation Project of Guangxi Graduate Education under the contract number XYCSZ2019064 and Graduate Technological Innovation Project of Beijing Institute of Technology under the contract number 2018CX20027.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bo Zhang.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhu, X., Guo, Q., Zhang, B. et al. An efficient approach for measuring semantic relatedness using Wikipedia bidirectional links. Appl Intell 49, 3708–3730 (2019). https://doi.org/10.1007/s10489-019-01452-1

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-019-01452-1

Keywords

Navigation