Skip to main content

From RDF to Natural Language and Back

  • Chapter
  • First Online:

Abstract

Most knowledge sources on the Data Web were extracted from structured or semistructured data sources. Thus, they encompass solely a small fraction of the information available on the document-oriented Web. In this chapter, we present Bootstrapping Linked Data (BOA), a framework that aims to facilitate the extraction of Resource Description Framework (RDF) from text. The idea behind BOA is to extract natural language patterns that represent predicates found on the Data Web from unstructured data by using background knowledge from the Data Web. These patterns are then used to extract instance knowledge from unstructured data sources. This knowledge can finally be fed back into the Data Web. The approach followed by BOA is quasi-independent of the language in which the corpus is written. We demonstrate our approach by applying it to four different corpora and two different languages. We evaluate BOA on these data sets using DBpedia as background knowledge. Our results show that we can extract several thousand new facts in one iteration with high accuracy. Moreover, we provide the first multilingual repository of natural language representations (NLR) of predicates found on the Data Web. Finally, we present two applications of the natural language patterns generated by BOA, i.e., the fact validation framework DeFacto and the question answering engine Template - based SPARQL Learner (TBSL).

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    A demo of the framework can be found at http://boa.aksw.org. The code of the project is at http://boa.googlecode.com.

  2. 2.

    http://www.alchemyapi.com.

  3. 3.

    http://rtw.ml.cmu.edu.

  4. 4.

    http://lemurproject.org/clueweb09.

  5. 5.

    http://www.w3.org/2011/prov/.

  6. 6.

    https://github.com/AKSW/DeFacto.

  7. 7.

    http://www.sc.cit-ec.uni-bielefeld.de/qald.

  8. 8.

    http://www.foaf-project.org/.

  9. 9.

    http://www.mpi-inf.mpg.de/yago-naga/yago/.

References

  • Auer, S., Lehmann, J., & Ngomo, A.-C. N. (2011). Introduction to linked data and its lifecycle on the web. In Reasoning Web (pp. 1–75). Berlin: Springer.

    Google Scholar 

  • Carlson, A., Betteridge, J., Kisiel, B., Settles, B., Hruschka, E., Jr., & Mitchell, T. M. (2010). Toward an architecture for never-ending language learning. In AAAI.

    Google Scholar 

  • Fader, A., Soderland, S., & Etzioni, O. (2011). Identifying relations for open information extraction. In EMNLP (pp. 1535–1545). Morristown: ACL.

    Google Scholar 

  • Finkel, J. R., & Manning, C. (2010). Hierarchical joint learning: Improving joint parsing and named entity recognition with non-jointly labeled data. In ACL.

    Google Scholar 

  • Gaag, A., Kohn, A., & Lindemann, U. (2009). Function-based solution retrieval and semantic search in mechanical engineering. In IDEC ’09 (pp. 147–158).

    Google Scholar 

  • Gerber, D., & Ngonga Ngomo, A.-C. (2011). Bootstrapping the linked data web. In 1st Workshop on Web Scale Knowledge Extraction.

    Google Scholar 

  • Gerber, D., & Ngonga Ngomo, A.-C. (2012). Extracting multilingual natural-language patterns for RDF predicates. In Proceedings of EKAW.

    Google Scholar 

  • Jiang, J. J., & Conrath, D. W. (1997). Semantic similarity based on corpus statistics and lexical taxonomy. In International Conference Research on Computational Linguistics (ROCLING X) (pp. 9008+).

    Google Scholar 

  • Kim, S. N., Medelyan, O., Kan, M.-Y., & Baldwin, T. (2010). Semeval-2010 task 5: Automatic keyphrase extraction from scientific articles. In SemEval ’10.

    Google Scholar 

  • Lehmann, J., Gerber, D., Morsey, M., & Ngonga Ngomo, A.-C. (2012). DeFacto - Deep fact validation. In 11th International Semantic Web Conference.

    Google Scholar 

  • Lehmann, J., Isele, R., Jakob, M., Jentzsch, A., Kontokostas, D., Mendes, P. N., et al. (2013). DBpedia - A large-scale, multilingual knowledge base extracted from Wikipedia. Semantic Web Journal (in press).

    Google Scholar 

  • Mendes, P. N., Jakob, M., García-Silva, A., & Bizer, C. (2011). DBpedia spotlight: Shedding light on the Web of documents. In Proceedings of I-SEMANTICS 2011.

    Google Scholar 

  • Mintz, M., Bills, S., Snow, R., & Jurafsky, D. (2009). Distant supervision for relation extraction without labeled data. In ACL (pp. 1003–1011).

    Google Scholar 

  • Nakamura, S., Konishi, S., Jatowt, A., Ohshima, H., Kondo, H., Tezuka, T., et al. (2007). Trustworthiness analysis of web search results. In ECDL (Vol. 4675, pp. 38–49).

    Google Scholar 

  • Nakashole, N., Theobald, M., & Weikum, G. (2011). Scalable knowledge harvesting with high precision and high recall. In WSDM (pp. 227–236).

    Google Scholar 

  • Ngonga Ngomo, A.-C., Heino, N., Lyko, K., Speck, R., & Kaltenböck, M. (2011). SCMS - Semantifying content management systems. In ISWC.

    Google Scholar 

  • Seco, N., Veale, T., & Hayes, J. (2004). An intrinsic information content metric for semantic similarity in WordNet. In Proceedings of the European Conference on Artificial Intelligence (ECAI) (Vol. 4, pp. 1089–1090).

    Google Scholar 

  • Unger, C., Bühmann, L., Lehmann, J., Ngonga Ngomo, A.- C., Gerber, D., & Cimiano, P. (2012). SPARQL template based question answering. In Proceedings of ISWC.

    Google Scholar 

  • Unger, C., Mccrae, J., Walter, S., Winter, S., & Cimiano, P. (2013). A lemon lexicon for DBpedia. In Proceedings of 1st International Workshop on NLP and DBpedia, October 21–25, Sydney, Australia. NLP & DBpedia 2013 (Vol. 1064). Sydney, Australia: CEUR Workshop Proceedings.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Daniel Gerber .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Gerber, D., Ngomo, AC.N. (2014). From RDF to Natural Language and Back. In: Buitelaar, P., Cimiano, P. (eds) Towards the Multilingual Semantic Web. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-43585-4_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-662-43585-4_12

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-662-43584-7

  • Online ISBN: 978-3-662-43585-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics