Skip to main content

TowardsWeb-Scale Collaborative Knowledge Extraction

  • Chapter
  • First Online:
The People’s Web Meets NLP

Abstract

While the Web of Data, the Web of Documents and Natural Language Processing are well researched individual fields, approaches to combine all three are fragmented and not yet well aligned. This chapter analyzes current efforts in collaborative knowledge extraction to uncover connection points between the three fields. The special focus is on three prominent RDF data sets (DBpedia, LinkedGeoData and Wiktionary2RDF), which allow users to influence the knowledge extraction process by adding another crowd-sourced layer on top. The recently published NLP Interchange Format (NIF) provides a way to annotate textual resources on the Web through the assignment of URIs with fragment identifiers. We will show how this formalism can easily be extended to encompass new annotation layers and vocabularies.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://dig.csail.mit.edu/breadcrumbs/node/215

  2. 2.

    http://www4.wiwiss.fu-berlin.de/lodcloud/state/

  3. 3.

    http://www.zemanta.com/

  4. 4.

    http://www.opencalais.com/

  5. 5.

    http://www.ontos.com/

  6. 6.

    http://enrycher.ijs.si/

  7. 7.

    http://extractiv.com/

  8. 8.

    http://www.alchemyapi.com/

  9. 9.

    http://aksw.org/Projects/FOX

  10. 10.

    http://dbpedia.org

  11. 11.

    http://mappings.dbpedia.org/

  12. 12.

    http://linkedgeodata.org

  13. 13.

    More data sets can be explored here: http://thedatahub.org/tag/published-by-third-party

  14. 14.

    http://www.anc.org/OANC/

  15. 15.

    http://richard.cyganiak.de/2007/10/lod/#open

  16. 16.

    http://www4.wiwiss.fu-berlin.de/lodcloud/state/#license

  17. 17.

    http://opendefinition.org/

  18. 18.

    http://factforge.net or http://lod.openlinksw.com provide SPARQL interfaces to query billions of aggregated facts.

  19. 19.

    http://swoogle.umbc.edu

  20. 20.

    http://sindice.com

  21. 21.

    http://mappings.dbpedia.org

  22. 22.

    http://mappings.dbpedia.org

  23. 23.

    For DBpedia Live see http://live.dbpedia.org/

  24. 24.

    http://wiki.dbpedia.org/Lexicalizations

  25. 25.

    http://openstreetmap.org

  26. 26.

    http://s23.org/wikistats/wiktionaries_html.php

  27. 27.

    See http://en.wiktionary.org/wiki/semantic for a simple example page

  28. 28.

    http://wiktionary.dbpedia.org/

  29. 29.

    http://www.mediawiki.org/wiki/Markup_spec

  30. 30.

    http://dumps.wikimedia.org/backup-index.html

  31. 31.

    http://wiki.dbpedia.org/Documentation

  32. 32.

    http://en.wiktionary.org/wiki/Template:senseid

  33. 33.

    http://stats.wikimedia.org/wiktionary/EN/TablesWikipediaEN.htm

  34. 34.

    http://en.wiktionary.org/wiki/Wiktionary:Text_of_Creative_Commons_Attribution-ShareAlike_3.0_Unported_License

  35. 35.

    http://en.wiktionary.org/wiki/Wiktionary:GNU_Free_Documentation_License

  36. 36.

    http://meta.wikimedia.org/wiki/Template:Wikimedia_Growth

  37. 37.

    http://stats.wikimedia.org/wiktionary/EN/TablesWikipediaEN.htm

  38. 38.

    For English see http://en.wiktionary.org/wiki/Wiktionary:ELE

  39. 39.

    http://downloads.dbpedia.org/wiktionary

  40. 40.

    for example http://wiktionary.dbpedia.org/resource/dog

  41. 41.

    http://wiktionary.dbpedia.org/fct

  42. 42.

    http://wiktionary.dbpedia.org/sparql

  43. 43.

    Note that with ‘/’ the identifier is sent to the server during a request (e.g. Linked Data), while everything after ‘#’ can only be processed by the client.

  44. 44.

    http://www.unicode.org/reports/tr15/#Norm_Forms

  45. 45.

    http://unicode.org/faq/char_combmark.html#7

  46. 46.

    for the resolution of prefixes, we refer the reader to http://prefix.cc

  47. 47.

    http://www.w3.org/TR/2004/REC-rdf-mt-20040210/#urisandlit

  48. 48.

    http://www.w3.org/TR/2004/REC-rdf-mt-20040210/#intro

  49. 49.

    http://lists.w3.org/Archives/Public/www-rdf-comments/2002JanMar/0127.html

  50. 50.

    http://purl.org/olia

  51. 51.

    http://purl.org/olia/penn.owl

  52. 52.

    http://www.comp.leeds.ac.uk/amalgam/tagsets/upenn.html

  53. 53.

    http://sourceforge.net/projects/olia/

  54. 54.

    http://www.w3.org/International/multilingualweb/lt/

  55. 55.

    http://www.w3.org/TR/2012/WD-its20-20120829/

  56. 56.

    http://stanbol.apache.org

  57. 57.

    http://tools.ietf.org/html/rfc1737

References

  1. Auer S, Lehmann J (2010) Making the web a data washing machine – creating knowledge out of interlinked data. Semant Web J 1:97–104

    Article  Google Scholar 

  2. Auer S, Dietzold S, Lehmann J, Hellmann S, Aumueller D (2009) Triplify: light-weight linked data publication from relational databases. In: Proceedings of the 18th international conference on world wide web, WWW 2009, Madrid, Spain, 20–24 April 2009. ACM, pp 621–630

    Google Scholar 

  3. Auer S, Lehmann J, Hellmann S (2009) LinkedGeoData – adding a spatial dimension to the web of data. In: Proceedings of 8th international semantic web Conference (ISWC), Chantilly, VA, USA

    Google Scholar 

  4. Berners-Lee T (2006) Design issues: linked data. http://www.w3.org/DesignIssues/LinkedData.html

  5. Bizer C (2011) Evolving the web into a global data space. http://www.wiwiss.fu-berlin.de/en/institute/pwo/bizer/research/publications/Bizer-GlobalDataSpace-Talk-BNCOD2011.pdf, keynote at 28th British National Conference on Databases (BNCOD2011)

  6. Bizer C (2012) Dbpedia 3.8 released, including enlarged ontology and additional localized versions. http://tinyurl.com/dbpedia-3-8

  7. Bühmann L, Lehmann J (2012) Universal owl axiom enrichment for large knowledge bases. In: Proceedings of EKAW 2012, Galway, Ireland. http://jens-lehmann.org/files/2012/ekaw_enrichment.pdf

  8. Chiarcos C (2012) Ontologies of linguistic annotation: survey and perspectives. In: Proceedings of the eight international conference on language resources and evaluation (LREC’12), Istanbul, Turkey

    Google Scholar 

  9. Chiarcos C (2012) Powla: modeling linguistic corpora in owl/dl. In: Proceedings of 9th extended semantic web conference (ESWC2012), Heraklion, Crete, Greece

    Google Scholar 

  10. Chiarcos C, Hellmann S, Nordhoff S (2011) Towards a linguistic linked open data cloud: the open linguistics working group. TAL 52(3):245–275. http://www.atala.org/Towards-a-Linguistic-Linked-Open

  11. Chiarcos C, Nordhoff S, Hellmann S (eds) (2012) Linked data in linguistics. Representing language data and metadata. Springer, Heidelberg. (ISBN 978-3-642-28248-5). http://www.springer.com/computer/ai/book/978-3-642-28248-5

  12. Gruber TR (1993) A translation approach to portable ontology specifications. Knowl Acquis 5(2):199–220

    Article  Google Scholar 

  13. Hellmann S, Unbehauen J, Chiarcos C, Ngonga Ngomo AC (2010) The TIGER corpus navigator. In: 9th international workshop on treebanks and linguistic theories (TLT-9), Tartu, Estonia, pp 91–102

    Google Scholar 

  14. Hellmann S, Lehmann J, Auer S (2012) Linked-data aware uri schemes for referencing text fragments. In: EKAW 2012, Galway, Ireland. Lecture notes in artificial intelligence (LNAI). Springer,

    Google Scholar 

  15. Hellmann S, Stadler C, Lehmann J (2012) The German DBpedia: a sense repository for linking entities. In: Chiarcos C, Nordhoff S, Hellmann S (eds) (2012) Linked data in linguistics. Representing language data and metadata. Springer, Berlin/New York, pp 181–190

    Google Scholar 

  16. Hepp M, Bachlechner D, Siorpaes K (2006) Harvesting wiki consensus – using wikipedia entries as ontology elements. In: Völkel M, Schaffert S (eds) Proceedings of the first workshop on semantic wikis – from wiki to semantics, co-located with the 3rd annual european semantic web conference (ESWC 2006), Budva, Montenegro. http://www.eswc2006.org/

  17. Hepp M, Siorpaes K, Bachlechner D (2007) Harvesting wiki consensus: using wikipedia entries as vocabulary for knowledge management. IEEE Internet Comput 11(5):54–65

    Article  Google Scholar 

  18. Ide N, Pustejovsky J (2010) What does interoperability mean, anyway? Toward an operational definition of interoperability. In: Proceedings of the second international conference on global interoperability for language resources (ICGL 2010), Hong Kong, China

    Google Scholar 

  19. Ide N, Suderman K (2007) GrAF: a graph-based format for linguistic annotations. In: Proceedings of the linguistic annotation workshop (LAW 2007), Prague, Czech Republic, pp 1–8

    Google Scholar 

  20. Khalili A, Auer S, Hladky D (2012) The rdfa content editor – from wysiwyg to wysiwym. In: Proceedings of COMPSAC 2012 – trustworthy software systems for the digital society, 16–20 July 2012, Izmir, Turkey. Best paper award

    Google Scholar 

  21. Kontokostas D, Bratsas C, Auer S, Hellmann S, Antoniou I, Metakides G (2011) Towards linked data internationalization – realizing the greek dbpedia. In: Proceedings of the ACM WebSci’11, Koblenz, Germany

    Google Scholar 

  22. Kontokostas D, Bratsas C, Auer S, Hellmann S, Antoniou I, Metakides G (2012) Internationalization of linked data: the case of the Greek DBpedia edition. J Web Semant 15:51–61

    Article  Google Scholar 

  23. Lehmann J, Bizer C, Kobilarov G, Auer S, Becker C, Cyganiak R, Hellmann S (2009) DBpedia – a crystallization point for the web of data. J Web Semant 7(3):154–165

    Article  Google Scholar 

  24. McCrae J, Cimiano P, Montiel-Ponsoda E (2012) Integrating WordNet and Wiktionary with lemon. In: Chiarcos C, Nordhoff S, Hellmann S (eds) Linked data in linguistics, Springer, Heidelberg. (ISBN 978-3-642-28248-5). http://www.springer.com/computer/ai/book/978-3-642-28248-5

  25. Mendes PN, Jakob M, García-Silva A, Bizer C (2011) Dbpedia spotlight: shedding light on the web of documents. In: Proceedings of the 7th international conference on semantic systems (I-Semantics), Graz, Austria

    Google Scholar 

  26. Mendes PN, Jakob M, Bizer C (2012) Dbpedia for nlp: a multilingual cross-domain knowledge base. In: Proceedings of the eight international conference on language resources and evaluation (LREC’12), Istanbul, Turkey

    Google Scholar 

  27. Meyer CM, Gurevych I (2011) OntoWiktionary – constructing an ontology from the collaborative online dictionary wiktionary. In: Pazienza M, Stellato A (eds) Semi-automatic ontology development: processes and resources. IGI Global, Hershey, PA, USA. http://www.ukp.tudarmstadt.de/publications/details/?no_cache=1&tx_bibtex_pi1[pub_id]=TUD-CS-2011-0202&type=99&tx_bibtex_pi1[bibtex]=yes

  28. Quasthoff M, Hellmann S, Höffner K (2009) Standardized multilingual language resources for the web of data: http://corpora.uni-leipzig.de/rdf. In: 3rd prize at the LOD triplification challenge, Graz. http://triplify.org/files/challenge_2009/languageresources.pdf

  29. Rizzo G, Troncy R, Hellmann S, Brümmer M (2012) NERD meets NIF: lifting NLP extraction results to the LinkedData cloud. In: Proceedings of linked data on the web workshop (WWW), Lyon, France

    Google Scholar 

  30. Stadler C, Lehmann J, Höffner K, Auer S (2011) Linkedgeodata: a core for a web of spatial open data. Semant Web J 3(4):333–354. http://iospress.metapress.com/content/141w054666871326

  31. Unbehauen J, Hellmann S, Auer S, Stadler C (2012) Knowledge extraction from structured sources. In: Search computing – broadening web search. Lecture Notes in Computer Science, vol 7538. Springer, Berlin/Heidelberg. http://link.springer.com/chapter/10.1007/978-3-642-34213-4_3

  32. Wilde E, Duerst M (2008) URI fragment identifiers for the text/plain media type. http://tools.ietf.org/html/rfc5147, [Online; Accessed 13-April-2011]

  33. Windhouwer M, Wright SE (2012) Linking to linguistic data categories in isocat. In: Chiarcos C, Nordhoff S, Hellmann S (eds) (2012) Linked data in linguistics. Representing language data and metadata. Springer, Berlin/New York

    Google Scholar 

Download references

Acknowledgements

We would like to thank our colleagues from AKSW research group and the LOD2 project for their helpful comments during the development of NIF. Especially, we would like to thank Christian Chiarcos for his support while using OLiA and Jonas Brekle for his work on Wiktionary2RDF. This work was partially supported by a grant from the European Union’s 7th Framework Programme provided for the project LOD2 (GA no. 257943).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Sebastian Hellmann or Sören Auer .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Hellmann, S., Auer, S. (2013). TowardsWeb-Scale Collaborative Knowledge Extraction. In: Gurevych, I., Kim, J. (eds) The People’s Web Meets NLP. Theory and Applications of Natural Language Processing. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35085-6_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-35085-6_11

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-35084-9

  • Online ISBN: 978-3-642-35085-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics