Unsupervised Conceptualization and Semantic Text Indexing for Information Extraction

  • Eugen RuppertEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9678)


The goal of my thesis is the extension of the Distributional Hypothesis [13] from the word to the concept level. This will be achieved by creating data-driven methods to create and apply conceptualizations, taxonomic semantic models that are grounded in the input corpus. Such conceptualizations can be used to disambiguate all words in the corpus, so that we can extract richer relations and create a dense graph of semantic relations between concepts. These relations will reduce sparsity issues, a common problem for contextualization techniques. By extending our conceptualization with named entities and multi-word entities (MWE), we can create a Linked Open Data knowledge base that is linked to existing knowledge bases like Freebase.


Semantic Model Context Feature Word Sense Word Sense Disambiguation Relation Extraction 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



This work has been supported by the German Federal Ministry of Education and Research (BMBF) within the context of the Software Campus project LiCoRes under grant No. 01IS12054. The author would like to thank his mentor Simone Paolo Ponzetto, his advisers Chris Biemann and Martin Riedl, and the reviewers for their valuable feedback.


  1. 1.
    Akbik, A., Michael, T.: The weltmodell: a data-driven commonsense knowledge base. In: Proceedings of LREC 2014, Reykjavik, Iceland, pp. 3272–3276 (2014)Google Scholar
  2. 2.
    Biemann, C.: Ontology learning from text: a survey of methods. LDV forum 20(2), 75–93 (2005)Google Scholar
  3. 3.
    Biemann, C.: Chinese whispers - an efficient graph clustering algorithm and its application to natural language processing problems. In: Proceedings of TextGraphs-1, New York City, NY, USA, pp. 73–80 (2006)Google Scholar
  4. 4.
    Biemann, C.: Turk bootstrap word sense inventory 2.0: a large-scale resource for lexical substitution. In: Proceedings of LREC 2012, Istanbul, Turkey, pp. 4038–4042 (2012)Google Scholar
  5. 5.
    Biemann, C., Riedl, M.: Text: now in 2D! a framework for lexical expansion with contextual similarity. J. Lang. Model. 1(1), 55–95 (2013)CrossRefGoogle Scholar
  6. 6.
    Bollacker, K., Evans, C., Paritosh, P., Sturge, T., Taylor, J.: Freebase: a collaboratively created graph database for structuring human knowledge. In: Proceedings of ACM SIGMOD 2008, Vancouver, Canada, pp. 1247–1250 (2008)Google Scholar
  7. 7.
    Chen, X., Liu, Z., Sun, M.: A unified model for word sense representation and disambiguation. In: Proceedings of EMNLP 2014, Doha, Qatar, pp. 1025–1035 (2014)Google Scholar
  8. 8.
    Daiber, J., Jakob, M., Hokamp, C., Mendes, P.N.: Improving efficiency and accuracy in multilingual entity extraction. In: Proceedings of I-SEMANTICS 2013, Graz, Austria, pp. 121–124. ACM (2013)Google Scholar
  9. 9.
    Drymonas, E., Zervanou, K., Petrakis, E.G.M.: Unsupervised ontology acquisition from plain texts: the OntoGain system. In: Hopfe, C.J., Rezgui, Y., Métais, E., Preece, A., Li, H. (eds.) NLDB 2010. LNCS, vol. 6177, pp. 277–287. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  10. 10.
    Etzioni, O., Fader, A., Christensen, J., Soderland, S., Mausam, M.: Open information extraction: the second generation. In: IJCAI, Barcelona, Spain, vol. 11, pp. 3–10 (2011)Google Scholar
  11. 11.
    Fellbaum, C.: Wordnet. An Electronic Lexical Database. MIT Press, Cambridge (1998)zbMATHGoogle Scholar
  12. 12.
    Feuerbach, T., Riedl, M., Biemann, C.: Distributional semantics for resolving bridging mentions. In: Proceedings of RANLP 2015, Hissar, Bulgaria, pp. 192–199 (2015)Google Scholar
  13. 13.
    Harris, Z.S.: Methods in Structural Linguistics. University of Chicago Press, Chicago (1951)Google Scholar
  14. 14.
    Hearst, M.A.: Automatic acquisition of hyponyms from large text corpora. In: Proceedings of COLING-1992, Nantes, France, pp. 539–545 (1992)Google Scholar
  15. 15.
    Iacobacci, I., Pilehvar, M.T., Navigli, R.: Sensembed: learning sense embeddings for word and relational similarity. In: Proceedings of ACL 2015, Beijing, China, pp. 95–105 (2015)Google Scholar
  16. 16.
    Klaussner, C., Zhekova, D.: Lexico-syntactic patterns for automatic ontology building. SRW at RANLP 2011, 109–114 (2011)Google Scholar
  17. 17.
    Lehmann, J., Isele, R., Jakob, M., Jentzsch, A., Kontokostas, D., Mendes, P.N., Hellmann, S., Morsey, M., van Kleef, P., Auer, S., Bizer, C.: DBpedia - a large-scale, multilingual knowledge base extracted from wikipedia. Semant. Web J. 6(2), 167–195 (2015)Google Scholar
  18. 18.
    Lesk, M.: Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone. In: Proceedings of SIGDOC 1986, pp. 24–26. ACM, Toronto, Ontario, Canada (1986)Google Scholar
  19. 19.
    Mahdisoltani, F., Biega, J., Suchanek, F.: YAGO3: a knowledge base from multilingual wikipedias. In: Proceedings of CIDR 2015, Asilomar, CA, USA (2015)Google Scholar
  20. 20.
    Manandhar, S., Klapaftis, I.P., Dligach, D., Pradhan, S.S.: SemEval-2010 task 14: word sense induction & disambiguation. In: Proceedings of SemEval-2010, Uppsala, Sweden, pp. 63–68 (2010)Google Scholar
  21. 21.
    Medelyan, O., Manion, S., Broekstra, J., Divoli, A., Huang, A.-L., Witten, I.H.: Constructing a focused taxonomy from a document collection. In: Cimiano, P., Corcho, O., Presutti, V., Hollink, L., Rudolph, S. (eds.) ESWC 2013. LNCS, vol. 7882, pp. 367–381. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  22. 22.
    Moro, A., Raganato, A., Navigli, R.: Entity linking meets word sense disambiguation: a unified approach. TACL 2, 231–244 (2014)Google Scholar
  23. 23.
    Navigli, R., Vannella, D.: SemEval-2013 task 11: word sense induction and disambiguation within an end-user application. In: Proceedings of *SEM 2013, Atlanta, GA, USA, vol. 2, pp. 193–201 (2013)Google Scholar
  24. 24.
    Nováček, V., Handschuh, S., Decker, S.: Getting the meaning right: a complementary distributional layer for the web semantics. In: Aroyo, L., Welty, C., Alani, H., Taylor, J., Bernstein, A., Kagal, L., Noy, N., Blomqvist, E. (eds.) ISWC 2011, Part I. LNCS, vol. 7031, pp. 504–519. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  25. 25.
    Pedersen, T., Patwardhan, S., Michelizzi, J.: Wordnet::similarity: measuring the relatedness of concepts. In: Demonstration Papers at HLT-NAACL 2004, Boston, MA, USA, pp. 38–41 (2004)Google Scholar
  26. 26.
    Piskorski, J., Yangarber, R.: Information extraction: past, present and future. In: Poibeau, T., Saggion, H., Piskorski, J., Yangarber, R. (eds.) Multi-source, Multilingual Information Extraction and Summarization. Theory and Applications of Natural Language Processing, pp. 23–49. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  27. 27.
    Poon, H., Domingos, P.: Unsupervised ontology induction from text. In: Proceedings of ACL 2010, Uppsala, Sweden, pp. 296–305 (2010)Google Scholar
  28. 28.
    Remus, S.: Unsupervised relation extraction of in-domain data from focused crawls. In: SRW at EACL 2014, Gothenburg, Sweden, pp. 11–20 (2014)Google Scholar
  29. 29.
    Riedl, M., Biemann, C.: A single word is not enough: ranking multiword expressions using distributional semantics. In: Proceedings of EMNLP 2015, Lisboa, Portugal, pp. 2430–4440 (2015)Google Scholar
  30. 30.
    Ruppert, E., Kaufmann, M., Riedl, M., Biemann, C.: JoBimViz: a web-based visualization for graph-based distributional semantic models. In: System Demonstrations at ACL 2015, Beijing, China, pp. 103–108 (2015)Google Scholar
  31. 31.
    Shinyama, Y., Sekine, S.: Preemptive information extraction using unrestricted relation discovery. In: Proceedings of HLT-NAACL 2006, New York, NY, USA, pp. 304–311 (2006)Google Scholar
  32. 32.
    Snow, R., Jurafsky, D., Ng, A.Y.: Semantic taxonomy induction from heterogenous evidence. In: Proceedings of COLING/ACL 2006, Sydney, Australia, pp. 801–808 (2006)Google Scholar
  33. 33.
    Speer, R., Havasi, C.: Conceptnet 5: a large semantic network for relational knowledge. In: Gurevych, I., Kim, J. (eds.) The Peoples Web Meets NLP. Theory and Applications of Natural Language Processing, pp. 161–176. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  34. 34.
    Treeratpituk, P., Khabsa, M., Giles, C.L.: Graph-based approach to automatic taxonomy generation (grabtax). CoRR abs/1307.1718 (2013)Google Scholar
  35. 35.
    Wong, W., Liu, W., Bennamoun, M.: Acquiring semantic relations using the web for constructing lightweight ontologies. In: Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, T.-B. (eds.) PAKDD 2009. LNCS, vol. 5476, pp. 266–277. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  36. 36.
    Yates, A., Cafarella, M., Banko, M., Etzioni, O., Broadhead, M., Soderland, S.: Textrunner: open information extraction on the web. In: System Demonstrations at NAACL 2007, Rochester, NY, USA, pp. 25–26 (2007)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  1. 1.FG Language TechnologyTechnische Universität DarmstadtDarmstadtGermany

Personalised recommendations