Skip to main content

Vector Space Representation of Concepts Using Wikipedia Graph Structure

  • Conference paper
  • First Online:
Natural Language Processing and Information Systems (NLDB 2017)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10260))

  • 1962 Accesses

Abstract

We introduce a vector space representation of concepts using Wikipedia graph structure to calculate semantic relatedness. The proposed method starts from the neighborhood graph of a concept as the primary form and transfers this graph into a vector space to obtain the final representation. The proposed method achieves state of the art results on various relatedness datasets.

Combining the vector space representation with standard coherence model, we show that the proposed relatedness method performs successfully in Word Sense Disambiguation (WSD). We then suggest a different formulation for coherence to demonstrate that, in a short enough sentence, there is one key entity that can help disambiguate every other entity. Using this finding, we provide a vector space based method that can outperform the standard coherence model in a significantly shorter computation time.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://ixa2.si.ehu.es/ukb/.

  2. 2.

    We use Wikipedia 20160305 dump for relatedness.

  3. 3.

    http://cgm6.research.cs.dal.ca/~sajadi/wikisim/.

  4. 4.

    The dataset is publicly available on the project website.

References

  1. Agirre, E., Alfonseca, E., Hall, K., Kravalova, J., Paşca, M., Soroa, A.: A study on similarity and relatedness using distributional and wordnet-based approaches. In: North American Chapter of the Association for Computational Linguistics, NAACL 2009, pp. 19–27. Association for Computational Linguistics, Stroudsburg (2009)

    Google Scholar 

  2. Agirre, E., Barrena, A., Soroa, A.: Studying the wikipedia hyperlink graph for relatedness and disambiguation. CoRR abs/1503.01655 (2015)

    Google Scholar 

  3. Bar-Yossef, Z., Mashiach, L.T.: Local approximation of pagerank and reverse pagerank. In: Proceedings of the 17th ACM Conference on Information and Knowledge Management, CIKM 2008, New York, NY, USA, pp. 279–288 (2008)

    Google Scholar 

  4. Chisholm, A., Hachey, B.: Entity disambiguation with web links. Trans. Assoc. Comput. Linguist. 3, 145–156 (2015)

    Google Scholar 

  5. Cucerzan, S.: Large-scale named entity disambiguation based on Wikipedia data. In: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), pp. 708–716. ACL, Prague, June 2007

    Google Scholar 

  6. Fiedler, M.: Laplacian of graphs and algebraic connectivity. Banach Center Publ. 25(1), 57–70 (1989)

    MATH  Google Scholar 

  7. Finkelstein, L., Gabrilovich, E., Matias, Y., Rivlin, E., Solan, Z., Wolfman, G., Ruppin, E.: Placing search in context: the concept revisited. In: Proceedings of the 10th International Conference on World Wide Web, WWW 2001, pp. 406–414. ACM (2001

    Google Scholar 

  8. Fogaras, D.: Where to start browsing the web? In: Böhme, T., Heyer, G., Unger, H. (eds.) IICS 2003. LNCS, vol. 2877, pp. 65–79. Springer, Heidelberg (2003). doi:10.1007/978-3-540-39884-4_6

    Chapter  Google Scholar 

  9. Gao, X., Xiao, B., Tao, D., Li, X.: A survey of graph edit distance. Pattern Anal. Appl. 13(1), 113–129 (2010)

    Article  MathSciNet  Google Scholar 

  10. Gyöngyi, Z., Garcia-Molina, H., Pedersen, J.: Combating web spam with trustrank. In: Proceedings of the Thirtieth International Conference on Very Large Data Bases, VLDB 2004, vol. 30. pp. 576–587. VLDB Endowment (2004)

    Google Scholar 

  11. Hoffart, J., Seufert, S., Nguyen, D.B., Theobald, M., Weikum, G.: Kore: Keyphrase overlap relatedness for entity disambiguation. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, CIKM 2012, New York, NY, USA, pp. 545–554 (2012)

    Google Scholar 

  12. Jabeen, S., Gao, X., Andreae, P.: CPRel: semantic relatedness computation using wikipedia based context profiles. Res. Comput. Sci. 70, 55–66 (2013)

    Google Scholar 

  13. Katz, L.: A new status index derived from sociometric analysis. Psychometrika 18(1), 39–43 (1953)

    Article  MATH  Google Scholar 

  14. Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. J. ACM (JACM) 46(5), 604–632 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  15. Kulkarni, S., Singh, A., Ramakrishnan, G., Chakrabarti, S.: Collective annotation of wikipedia entities in web text. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2009, New York, NY, USA, pp. 457–466 (2009)

    Google Scholar 

  16. Lazic, N., Subramanya, A., Ringgaard, M., Pereira, F.: Plato: a selective context model for entity resolution. Trans. Assoc. Comput. Linguist. 3, 503–515 (2015)

    Google Scholar 

  17. Lougee-Heimer, R.: The common optimization interface for operations research: promoting open-source software in the operations research community. IBM J. Res. Dev. 47(1), 57–66 (2003)

    Article  Google Scholar 

  18. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems (NIPS), pp. 3111–3119 (2013)

    Google Scholar 

  19. Miller, G.A., Charles, W.G.: Contextual correlates of semantic similarity. Lang. Cogn. Process. 6(1), 1–28 (1991)

    Article  Google Scholar 

  20. Milne, D., Witten, I.H.: An effective, low-cost measure of semantic relatedness obtained from wikipedia links. In: Proceedings of AAAI 2008 (2008)

    Google Scholar 

  21. Milne, D., Witten, I.H.: Learning to link with wikipedia. In: Proceedings of the 17th ACM Conference on Information and Knowledge Management, CIKM 2008, New York, NY, USA, pp. 509–518 (2008)

    Google Scholar 

  22. Page, L., Brin, S., Motwani, R., Winograd, T.: The pagerank citation ranking: bringing order to the web. Technical report 1999-66, Stanford InfoLab, November 1999

    Google Scholar 

  23. Pakhomov, S., McInnes, B., Adam, T., Liu, Y., Pedersen, T., Melton, G.B.: Semantic similarity and relatedness between clinical terms: an experimental study. AMIA Ann. Symp. Proc. 2010, 572–576 (2010)

    Google Scholar 

  24. Pakhomov, S.V.S., Pedersen, T., McInnes, B., Melton, G.B., Ruggieri, A., Chute, C.G.: Towards a framework for developing semantic relatedness reference standards. J. Biomed. Inform. 44(2), 251–265 (2011)

    Article  Google Scholar 

  25. Pedersen, T., Pakhomov, S.V., Patwardhan, S., Chute, C.G.: Measures of semantic similarity and relatedness in the biomedical domain. J. Biomed. Inform. 40(3), 288–299 (2007)

    Article  Google Scholar 

  26. Ponzetto, S.P., Strube, M.: Knowledge derived from wikipedia for computing semantic relatedness. J. Artif. Intell. Res. (JAIR) 30, 181–212 (2007)

    MATH  Google Scholar 

  27. Ratinov, L., Roth, D., Downey, D., Anderson, M.: Local and global algorithms for disambiguation to wikipedia. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, HLT 2011, vol. 1, pp. 1375–1384. Association for Computational Linguistics, Stroudsburg (2011)

    Google Scholar 

  28. Rubenstein, H., Goodenough, J.B.: Contextual correlates of synonymy. Commun. ACM 8(10), 627–633 (1965)

    Article  Google Scholar 

  29. Sajadi, A., Milios, E.E., Kešelj, V., Janssen, J.C.M.: Domain-specific semantic relatedness from wikipedia structure: a case study in biomedical text. In: Gelbukh, A. (ed.) CICLing 2015. LNCS, vol. 9041, pp. 347–360. Springer, Cham (2015). doi:10.1007/978-3-319-18111-0_26

    Google Scholar 

  30. Sherkat, E., Milios, E.: Vector embedding of wikipedia concepts and entities. ArXiv e-prints, February 2017

    Google Scholar 

  31. Yeh, E., Ramage, D., Manning, C.D., Agirre, E., Soroa, A.: Wikiwalk: random walks on wikipedia for semantic relatedness. In: Proceedings of the 2009 Workshop on Graph-based Methods for Natural Language Processing, TextGraphs-4, pp. 41–49. Association for Computational Linguistics, Stroudsburg (2009)

    Google Scholar 

Download references

Acknowledgments

This research was funded by the Natural Sciences and Engineering Research Council of Canada (NSERC), the Boeing Company, and Mitacs. We would also like to thank Jeannette C.M. Janssen for comments that greatly improved the manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Armin Sajadi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Sajadi, A., Milios, E.E., Keselj, V. (2017). Vector Space Representation of Concepts Using Wikipedia Graph Structure. In: Frasincar, F., Ittoo, A., Nguyen, L., Métais, E. (eds) Natural Language Processing and Information Systems. NLDB 2017. Lecture Notes in Computer Science(), vol 10260. Springer, Cham. https://doi.org/10.1007/978-3-319-59569-6_48

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-59569-6_48

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-59568-9

  • Online ISBN: 978-3-319-59569-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics