Skip to main content

Semantic Relatedness Measurement from Wikipedia and WordNet Using Modified Normalized Google Distance

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 43))

Abstract

Many applications in natural language processing require semantic relatedness between words to be quantified. Existing WordNet-based approaches fail in the case of non-dictionary words, jargons, or some proper nouns. Meaning of terms evolves over the years which have not been reflected in WordNet. However, WordNet cannot be ignored as it considers the semantics of the language along with its contextual meaning. Hence, we propose a method which uses data from Wikipedia and WordNet’s Brown corpus to calculate semantic relatedness using modified form of Normalized Google Distance (NGD). NGD incorporates word sense derived from WordNet and occurrence over the data from Wikipedia. Through experiments, we performed on a set of selected word pairs, and we found that the proposed method calculates relatedness that significantly correlates human intuition.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Gupta, V., Lehal, G.S.: A survey of text summarization extractive techniques. J. Emerg. Technol. Web Intell. 2(3), 258–268 (2010)

    Google Scholar 

  2. Das, D., Martins, A.F.: A survey on automatic text summarization. Lit. Surv. Lang. Stat. II Course CMU 4, 192–195 (2007)

    Google Scholar 

  3. Hasan, K.S., Ng, V.: Automatic keyphrase extraction: a survey of the state of the art. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume: Long Papers), vol. 1, pp. 1262–1273 (2014)

    Google Scholar 

  4. Lott, B.: Survey of Keyword Extraction Techniques. UNM Education, 50 (2012)

    Google Scholar 

  5. Gabrilovich, E., Markovitch, S.: Computing semantic relatedness using wikipedia-based explicit semantic analysis. In: IJcAI, vol. 7, pp. 1606–1611, Jan 2007

    Google Scholar 

  6. Budanitsky, A., Hirst, G.: Evaluating wordnet-based measures of lexical semantic relatedness. Comput. Linguist. 32(1), 13–47 (2006)

    Article  Google Scholar 

  7. Gomaa, W.H., Fahmy, A.A.: A survey of text similarity approaches. Int. J. Comput. Appl. 68(13)

    Google Scholar 

  8. Cilibrasi, R.L., Vitanyi, P.M.B.: The google similarity distance. IEEE Trans. Knowl. Data Eng. 19(3), 370–383 (2007)

    Article  Google Scholar 

  9. Jiang, J.J., Conrath, D.W.: Semantic similarity based on corpus statistics and lexical taxonomy (1997). arXiv:cmp-lg/9709008

  10. Miller, G.A.: WordNet: a lexical database for English. Commun. ACM 38(11), 39–41 (1995)

    Article  Google Scholar 

  11. Leacock, C., Miller, G.A., Chodorow, M.: Using corpus statistics and WordNet relations for sense identification. Comput. Linguist. 24(1), 147–165 (1998)

    Google Scholar 

  12. Wu, Z., Palmer, M.: Verbs semantics and lexical selection. In Proceedings of the 32nd Annual Meeting on Association for Computational Linguistics, pp. 133–138. Association for Computational Linguistics (1994)

    Google Scholar 

  13. Resnik, P.: Using information content to evaluate semantic similarity in a taxonomy (1995). arXiv:cmp-lg/9511007

  14. Lin, D.: An information-theoretic definition of similarity. In: Icml, vol. 98, no. 1998, pp. 296–304, July 1998

    Google Scholar 

  15. Turney, P.D.: Mining the web for synonyms: PMI-IR versus LSA on TOEFL. In: European Conference on Machine Learning, pp. 491–502. Springer, Berlin, Heidelberg Sept 2001

    Chapter  Google Scholar 

  16. Landauer, T.K., Foltz, P.W., Laham, D.: An introduction to latent semantic analysis. Discourse Process. 25(2–3), 259–284 (1998)

    Article  Google Scholar 

  17. https://developers.google.com/custom-search/json-api/v1/overview

  18. https://www.mediawiki.org/wiki/API:Search

  19. https://developer.yahoo.com/boss/search/

  20. Marcus, M.P., Marcinkiewicz, M.A., Santorini, B.: Building a large annotated corpus of English: the penn treebank. Comput. Linguist. 19(2), 313–330 (1993)

    Google Scholar 

  21. https://en.wikipedia.org/wiki/Special:Statistics

  22. Rubenstein, W.B., Kubicar, M.S., Cattell, R.G.G.: Benchmarking simple database operations. ACM SIGMOD Rec. 16(3) 387–394. ACM

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Saket Karve .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Karve, S., Shende, V., Hople, S. (2019). Semantic Relatedness Measurement from Wikipedia and WordNet Using Modified Normalized Google Distance. In: Nagabhushan, P., Guru, D., Shekar, B., Kumar, Y. (eds) Data Analytics and Learning. Lecture Notes in Networks and Systems, vol 43. Springer, Singapore. https://doi.org/10.1007/978-981-13-2514-4_13

Download citation

Publish with us

Policies and ethics