Linked Disambiguated Distributional Semantic Networks

  • Stefano Faralli
  • Alexander Panchenko
  • Chris Biemann
  • Simone P. Ponzetto
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9982)

Abstract

We present a new hybrid lexical knowledge base that combines the contextual information of distributional models with the conciseness and precision of manually constructed lexical networks. The computation of our count-based distributional model includes the induction of word senses for single-word and multi-word terms, the disambiguation of word similarity lists, taxonomic relations extracted by patterns and context clues for disambiguation in context. In contrast to dense vector representations, our resource is human readable and interpretable, and thus can be easily embedded within the Semantic Web ecosystem.

References

  1. 1.
    Biemann, C.: Chinese Whispers: an efficient graph clustering algorithm and its application to natural language processing problems. In: Proceedings of the TextGraphs, pp. 73–80 (2006)Google Scholar
  2. 2.
    Biemann, C., Riedl, M.: Text: now in 2D! a framework for lexical expansion with contextual similarity. J. Lang. Model. 1(1), 55–95 (2013)CrossRefGoogle Scholar
  3. 3.
    Bizer, C., Lehmann, J., Kobilarov, G., Auer, S., Becker, C., Cyganiak, R., Hellmann, S.: DBpedia - a crystallization point for the web of data. JWS 7(3), 154–165 (2009)CrossRefGoogle Scholar
  4. 4.
    Camacho-Collados, J., Pilehvar, M.T., Navigli, R.: NASARI: a novel approach to a semantically-aware representation of items. In: Proceedings of the NAACL-HLT, pp. 567–577 (2015)Google Scholar
  5. 5.
    Chen, X., Liu, Z., Sun, M.: A unified model for word sense representation and disambiguation. In: Proceedings of the EMNLP, pp. 1025–1035 (2014)Google Scholar
  6. 6.
    Chiarcos, C., Hellmann, S., Nordhoff, S.: Linking linguistic resources: examples from the open linguistics working group. In: Chiarcos, C., Nordhoff, S., Hellmann, S. (eds.) Linked Data in Linguistics - Representing and Connecting Language Data and Language Metadata, pp. 201–216. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  7. 7.
    Dong, X., Gabrilovich, E., Heitz, G., Horn, W., Lao, N., Murphy, K., Strohmann, T., Sun, S., Zhang, W.: Knowledge vault: a web-scale approach to probabilistic knowledge fusion. In: Proceedings of the KDD, pp. 601–610 (2014)Google Scholar
  8. 8.
    Evert, S.: The Statistics of Word Cooccurrences: Word Pairs and Collocations. Ph.D. thesis, Institut für maschinelle Sprachverarbeitung, University of Stuttgart (2005)Google Scholar
  9. 9.
    Faralli, S., Navigli, R.: Growing multi-domain glossaries from a few seeds using probabilistic topic models. In: Proceedings of the EMNLP, pp. 170–181 (2013)Google Scholar
  10. 10.
    Fellbaum, C. (ed.): WordNet: An Electronic Database. MIT Press, Cambridge (1998)MATHGoogle Scholar
  11. 11.
    Goikoetxea, J., Soroa, A., Agirre, E.: Random walks and neural network language models on knowledge bases. In: Proceedings of the NAACL HLT, pp. 1434–1439 (2015)Google Scholar
  12. 12.
    Hearst, M.A.: Automatic acquisition of hyponyms from large text corpora. In: Proceedings of the COLING, pp. 539–545 (1992)Google Scholar
  13. 13.
    Hoffart, J., Suchanek, F.M., Berberich, K., Weikum, G.: YAGO2: a spatially and temporally enhanced knowledge base from Wikipedia. ArtInt, pp. 28–61 (2013)Google Scholar
  14. 14.
    McCrae, J.P., Fellbaum, C., Cimiano, P.: Publishing and linking WordNet using lemon and RDF. In: Proceedings of the 3rd Workshop on Linked Data in Linguistics (2014)Google Scholar
  15. 15.
    Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Proceedings of the NIPS, pp. 3111–3119 (2013)Google Scholar
  16. 16.
    Navigli, R., Ponzetto, S.P.: BabelNet: the automatic construction, evaluation and application of a wide-coverage multilingual semantic network. ArtInt. 193, 217–250 (2012)MathSciNetMATHGoogle Scholar
  17. 17.
    Parker, R., Graff, D., Kong, J., Chen, K., Maeda, K.: English Gigaword, 5th edn. Linguistic Data Consortium, Philadelphia (2011)Google Scholar
  18. 18.
    Remus, S., Biemann, C.: Domain-specific corpus expansion with focused webcrawling. In: Proceedings of the LREC (2016)Google Scholar
  19. 19.
    Richter, M., Quasthoff, U., Hallsteinsdóttir, E., Biemann, C.: Exploiting the Leipzig corpora collection. In: Proceedings of the IS-LTC (2006)Google Scholar
  20. 20.
    Riedl, M., Biemann, C.: A single word is not enough: ranking multiword expressions using distributional semantics. In: Proceedings of the EMNLP, pp. 2430–2440 (2015)Google Scholar
  21. 21.
    Rothe, S., Schütze, H.: AutoExtend: extending word embeddings to embeddings for synsets and lexemes. In: Proceedings of the ACL, pp. 1793–1803 (2015)Google Scholar
  22. 22.
    Ruppert, E., Kaufmann, M., Riedl, M., Biemann, C.: JoBimViz: a web-based visualization for graph-based distributional semantic models. In: Proceedings of the ACL-IJCNLP System Demonstrations, pp. 103–108 (2015)Google Scholar

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  • Stefano Faralli
    • 1
  • Alexander Panchenko
    • 2
  • Chris Biemann
    • 2
  • Simone P. Ponzetto
    • 1
  1. 1.Data and Web Science GroupUniversity of MannheimMannheimGermany
  2. 2.Language Technology GroupTU DarmstadtDarmstadtGermany

Personalised recommendations