Semantic Relatedness Measurement from Wikipedia and WordNet Using Modified Normalized Google Distance

Karve, Saket; Shende, Vasisht; Hople, Swaroop

doi:10.1007/978-981-13-2514-4_13

Semantic Relatedness Measurement from Wikipedia and WordNet Using Modified Normalized Google Distance

Saket Karve⁶,
Vasisht Shende⁶ &
Swaroop Hople⁶

Conference paper
First Online: 05 November 2018

824 Accesses
6 Citations

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 43))

Abstract

Many applications in natural language processing require semantic relatedness between words to be quantified. Existing WordNet-based approaches fail in the case of non-dictionary words, jargons, or some proper nouns. Meaning of terms evolves over the years which have not been reflected in WordNet. However, WordNet cannot be ignored as it considers the semantics of the language along with its contextual meaning. Hence, we propose a method which uses data from Wikipedia and WordNet’s Brown corpus to calculate semantic relatedness using modified form of Normalized Google Distance (NGD). NGD incorporates word sense derived from WordNet and occurrence over the data from Wikipedia. Through experiments, we performed on a set of selected word pairs, and we found that the proposed method calculates relatedness that significantly correlates human intuition.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Gupta, V., Lehal, G.S.: A survey of text summarization extractive techniques. J. Emerg. Technol. Web Intell. 2(3), 258–268 (2010)
Google Scholar
Das, D., Martins, A.F.: A survey on automatic text summarization. Lit. Surv. Lang. Stat. II Course CMU 4, 192–195 (2007)
Google Scholar
Hasan, K.S., Ng, V.: Automatic keyphrase extraction: a survey of the state of the art. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume: Long Papers), vol. 1, pp. 1262–1273 (2014)
Google Scholar
Lott, B.: Survey of Keyword Extraction Techniques. UNM Education, 50 (2012)
Google Scholar
Gabrilovich, E., Markovitch, S.: Computing semantic relatedness using wikipedia-based explicit semantic analysis. In: IJcAI, vol. 7, pp. 1606–1611, Jan 2007
Google Scholar
Budanitsky, A., Hirst, G.: Evaluating wordnet-based measures of lexical semantic relatedness. Comput. Linguist. 32(1), 13–47 (2006)
Article Google Scholar
Gomaa, W.H., Fahmy, A.A.: A survey of text similarity approaches. Int. J. Comput. Appl. 68(13)
Google Scholar
Cilibrasi, R.L., Vitanyi, P.M.B.: The google similarity distance. IEEE Trans. Knowl. Data Eng. 19(3), 370–383 (2007)
Article Google Scholar
Jiang, J.J., Conrath, D.W.: Semantic similarity based on corpus statistics and lexical taxonomy (1997). arXiv:cmp-lg/9709008
Miller, G.A.: WordNet: a lexical database for English. Commun. ACM 38(11), 39–41 (1995)
Article Google Scholar
Leacock, C., Miller, G.A., Chodorow, M.: Using corpus statistics and WordNet relations for sense identification. Comput. Linguist. 24(1), 147–165 (1998)
Google Scholar
Wu, Z., Palmer, M.: Verbs semantics and lexical selection. In Proceedings of the 32nd Annual Meeting on Association for Computational Linguistics, pp. 133–138. Association for Computational Linguistics (1994)
Google Scholar
Resnik, P.: Using information content to evaluate semantic similarity in a taxonomy (1995). arXiv:cmp-lg/9511007
Lin, D.: An information-theoretic definition of similarity. In: Icml, vol. 98, no. 1998, pp. 296–304, July 1998
Google Scholar
Turney, P.D.: Mining the web for synonyms: PMI-IR versus LSA on TOEFL. In: European Conference on Machine Learning, pp. 491–502. Springer, Berlin, Heidelberg Sept 2001
Chapter Google Scholar
Landauer, T.K., Foltz, P.W., Laham, D.: An introduction to latent semantic analysis. Discourse Process. 25(2–3), 259–284 (1998)
Article Google Scholar
https://developers.google.com/custom-search/json-api/v1/overview
https://www.mediawiki.org/wiki/API:Search
https://developer.yahoo.com/boss/search/
Marcus, M.P., Marcinkiewicz, M.A., Santorini, B.: Building a large annotated corpus of English: the penn treebank. Comput. Linguist. 19(2), 313–330 (1993)
Google Scholar
https://en.wikipedia.org/wiki/Special:Statistics
Rubenstein, W.B., Kubicar, M.S., Cattell, R.G.G.: Benchmarking simple database operations. ACM SIGMOD Rec. 16(3) 387–394. ACM
Article Google Scholar

Download references

Author information

Authors and Affiliations

Veermata Jijabai Technological Institute, Mumbai, India
Saket Karve, Vasisht Shende & Swaroop Hople

Authors

Saket Karve
View author publications
You can also search for this author in PubMed Google Scholar
Vasisht Shende
View author publications
You can also search for this author in PubMed Google Scholar
Swaroop Hople
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Saket Karve .

Editor information

Editors and Affiliations

Indian Institute of Information Technology, Allahabad, Uttar Pradesh, India
P. Nagabhushan
Department of Computer Science and Engineering, CBCS Education, University of Mysore, Mysuru, Karnataka, India
D. S. Guru
Department of Studies in Computer Science, Mangalore University, Mangalore, Karnataka, India
B. H. Shekar
Department of Information Science and Engineering, Maharaja Institute of Technology, Belawadi, Karnataka, India
Y. H. Sharath Kumar

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Karve, S., Shende, V., Hople, S. (2019). Semantic Relatedness Measurement from Wikipedia and WordNet Using Modified Normalized Google Distance. In: Nagabhushan, P., Guru, D., Shekar, B., Kumar, Y. (eds) Data Analytics and Learning. Lecture Notes in Networks and Systems, vol 43. Springer, Singapore. https://doi.org/10.1007/978-981-13-2514-4_13

Download citation

DOI: https://doi.org/10.1007/978-981-13-2514-4_13
Published: 05 November 2018
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-2513-7
Online ISBN: 978-981-13-2514-4
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics