Skip to main content

A Corpus-Based Approach for the Induction of Ontology Lexica

  • Conference paper
Natural Language Processing and Information Systems (NLDB 2013)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7934))

Abstract

While there are many large knowledge bases (e.g. Freebase, Yago, DBpedia) as well as linked data sets available on the web, they typically lack lexical information stating how the properties and classes are realized lexically. If at all, typically only one label is attached to these properties, thus lacking any deeper syntactic information, e.g. about syntactic arguments and how these map to the semantic arguments of the property as well as about possible lexical variants or paraphrases. While there are lexicon models such as lemon allowing to define a lexicon for a given ontology, the cost involved in creating and maintaining such lexica is substantial, requiring a high manual effort. Towards lowering this effort, in this paper we present a semi-automatic approach that exploits a corpus to find occurrences in which a given property is expressed, and generalizing over these occurrences by extracting dependency paths that can be used as a basis to create lemon lexicon entries. We evaluate the resulting automatically generated lexica with respect to DBpedia as dataset and Wikipedia as corresponding corpus, both in an automatic mode, by comparing to a manually created lexicon, and in a semi-automatic mode in which a lexicon engineer inspected the results of the corpus-based approach, adding them to the existing lexicon if appropriate.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Akbik, A., Broß, J.: Wanderlust: Extracting semantic relations from natural language text using dependency grammar patterns. In: Proceedings of the Workshop on Semantic Search in Conjunction with the 18th Int. World Wide Web Conference (2009)

    Google Scholar 

  2. Bernstein, A., Kaufmann, E., Kaiser, C., Kiefer, C.: Ginseng: A guided input natural language search engine. In: Proceedings of the 15th Workshop on Information Technologies and Systems, pp. 45–50 (2005)

    Google Scholar 

  3. Blohm, S., Cimiano, P.: Using the web to reduce data sparseness in pattern-based information extraction. In: Kok, J.N., Koronacki, J., Lopez de Mantaras, R., Matwin, S., Mladenič, D., Skowron, A. (eds.) PKDD 2007. LNCS (LNAI), vol. 4702, pp. 18–29. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  4. Bouayad-Agha, N., Casamayor, G., Wanner, L.: Natural language generation and semantic web technologies. Semantic Web Journal (in press)

    Google Scholar 

  5. Gerber, D., Ngomo, A.: Bootstrapping the linked data web. In: Proceedings of the 10th International Semantic Web Conference, ISWC (2011)

    Google Scholar 

  6. Ittoo, A., Bouma, G.: On learning subtypes of the part-whole relation: Do not mix your seeds. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (ACL), pp. 1328–1336 (2010)

    Google Scholar 

  7. Ling, D., Pantel, P.: DIRT - discovery of inference rules of text. In: Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 323–328. ACM (2001)

    Google Scholar 

  8. Lopez, V., Fernandez, M., Motta, E., Stieler, N.: Poweraqua: Supporting users in querying and exploring the semantic web. Semantic Web Journal, 249–265 (2012)

    Google Scholar 

  9. McCrae, J., Spohr, D., Cimiano, P.: Linking lexical resources and ontologies on the semantic web with lemon. In: Antoniou, G., Grobelnik, M., Simperl, E., Parsia, B., Plexousakis, D., De Leenheer, P., Pan, J. (eds.) ESWC 2011, Part I. LNCS, vol. 6643, pp. 245–259. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  10. Mellish, C., Sun, X.: The semantic web as a linguistic resource: opportunities for natural language generation. In: Proceedings of 26th SGAI International Conference on Innovative Techniques and Applications of Artificial Intelligence, pp. 298–303. Elsevier (2006)

    Google Scholar 

  11. Pantel, P., Pennacchiotti, M.: Espresso: Leveraging generic patterns for automatically harvesting semantic relations. In: Proceedings of the 21st International Conference on Computational Linguistics (COLING), pp. 113–120. ACM (2006)

    Google Scholar 

  12. Third, A., Williams, S., Power, R.: OWL to english: a tool for generating organised easily-navigated hypertexts from ontologies. In: Proceedings of 10th International Semantic Web Conference (ISWC), pp. 298–303 (2011)

    Google Scholar 

  13. Unger, C., Bühmann, L., Lehmann, J., Ngonga-Ngomo, A.-C., Gerber, D., Cimiano, P.: Sparql template-based question answering. In: Proceedings of the World Wide Web Conference (WWW), pp. 639–648. ACM (2012)

    Google Scholar 

  14. Walter, S., Unger, C., Cimiano, P., Bär, D.: Evaluation of a layered approach to question answering over linked data. In: Cudré-Mauroux, P., et al. (eds.) ISWC 2012, Part II. LNCS, vol. 7650, pp. 362–374. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Walter, S., Unger, C., Cimiano, P. (2013). A Corpus-Based Approach for the Induction of Ontology Lexica. In: Métais, E., Meziane, F., Saraee, M., Sugumaran, V., Vadera, S. (eds) Natural Language Processing and Information Systems. NLDB 2013. Lecture Notes in Computer Science, vol 7934. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38824-8_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-38824-8_9

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-38823-1

  • Online ISBN: 978-3-642-38824-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics