Abstract
Many applications in natural language processing require semantic relatedness between words to be quantified. Existing WordNet-based approaches fail in the case of non-dictionary words, jargons, or some proper nouns. Meaning of terms evolves over the years which have not been reflected in WordNet. However, WordNet cannot be ignored as it considers the semantics of the language along with its contextual meaning. Hence, we propose a method which uses data from Wikipedia and WordNet’s Brown corpus to calculate semantic relatedness using modified form of Normalized Google Distance (NGD). NGD incorporates word sense derived from WordNet and occurrence over the data from Wikipedia. Through experiments, we performed on a set of selected word pairs, and we found that the proposed method calculates relatedness that significantly correlates human intuition.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Gupta, V., Lehal, G.S.: A survey of text summarization extractive techniques. J. Emerg. Technol. Web Intell. 2(3), 258–268 (2010)
Das, D., Martins, A.F.: A survey on automatic text summarization. Lit. Surv. Lang. Stat. II Course CMU 4, 192–195 (2007)
Hasan, K.S., Ng, V.: Automatic keyphrase extraction: a survey of the state of the art. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume: Long Papers), vol. 1, pp. 1262–1273 (2014)
Lott, B.: Survey of Keyword Extraction Techniques. UNM Education, 50 (2012)
Gabrilovich, E., Markovitch, S.: Computing semantic relatedness using wikipedia-based explicit semantic analysis. In: IJcAI, vol. 7, pp. 1606–1611, Jan 2007
Budanitsky, A., Hirst, G.: Evaluating wordnet-based measures of lexical semantic relatedness. Comput. Linguist. 32(1), 13–47 (2006)
Gomaa, W.H., Fahmy, A.A.: A survey of text similarity approaches. Int. J. Comput. Appl. 68(13)
Cilibrasi, R.L., Vitanyi, P.M.B.: The google similarity distance. IEEE Trans. Knowl. Data Eng. 19(3), 370–383 (2007)
Jiang, J.J., Conrath, D.W.: Semantic similarity based on corpus statistics and lexical taxonomy (1997). arXiv:cmp-lg/9709008
Miller, G.A.: WordNet: a lexical database for English. Commun. ACM 38(11), 39–41 (1995)
Leacock, C., Miller, G.A., Chodorow, M.: Using corpus statistics and WordNet relations for sense identification. Comput. Linguist. 24(1), 147–165 (1998)
Wu, Z., Palmer, M.: Verbs semantics and lexical selection. In Proceedings of the 32nd Annual Meeting on Association for Computational Linguistics, pp. 133–138. Association for Computational Linguistics (1994)
Resnik, P.: Using information content to evaluate semantic similarity in a taxonomy (1995). arXiv:cmp-lg/9511007
Lin, D.: An information-theoretic definition of similarity. In: Icml, vol. 98, no. 1998, pp. 296–304, July 1998
Turney, P.D.: Mining the web for synonyms: PMI-IR versus LSA on TOEFL. In: European Conference on Machine Learning, pp. 491–502. Springer, Berlin, Heidelberg Sept 2001
Landauer, T.K., Foltz, P.W., Laham, D.: An introduction to latent semantic analysis. Discourse Process. 25(2–3), 259–284 (1998)
https://developers.google.com/custom-search/json-api/v1/overview
Marcus, M.P., Marcinkiewicz, M.A., Santorini, B.: Building a large annotated corpus of English: the penn treebank. Comput. Linguist. 19(2), 313–330 (1993)
Rubenstein, W.B., Kubicar, M.S., Cattell, R.G.G.: Benchmarking simple database operations. ACM SIGMOD Rec. 16(3) 387–394. ACM
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Karve, S., Shende, V., Hople, S. (2019). Semantic Relatedness Measurement from Wikipedia and WordNet Using Modified Normalized Google Distance. In: Nagabhushan, P., Guru, D., Shekar, B., Kumar, Y. (eds) Data Analytics and Learning. Lecture Notes in Networks and Systems, vol 43. Springer, Singapore. https://doi.org/10.1007/978-981-13-2514-4_13
Download citation
DOI: https://doi.org/10.1007/978-981-13-2514-4_13
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-2513-7
Online ISBN: 978-981-13-2514-4
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)