Chaudron: Extending DBpedia with Measurement

Conference paper

DOI: 10.1007/978-3-319-58068-5_27

Part of the Lecture Notes in Computer Science book series (LNCS, volume 10249)
Cite this paper as:
Subercaze J. (2017) Chaudron: Extending DBpedia with Measurement. In: Blomqvist E., Maynard D., Gangemi A., Hoekstra R., Hitzler P., Hartig O. (eds) The Semantic Web. ESWC 2017. Lecture Notes in Computer Science, vol 10249. Springer, Cham

Abstract

Wikipedia is the largest collaborative encyclopedia and is used as the source for DBpedia, a central dataset of the LOD cloud. Wikipedia contains numerous numerical measures on the entities it describes, as per the general character of the data it encompasses. The DBpedia Information Extraction Framework transforms semi-structured data from Wikipedia into structured RDF. However this extraction framework offers a limited support to handle measurement in Wikipedia.

In this paper, we describe the automated process that enables the creation of the Chaudron dataset. We propose an alternative extraction to the traditional mapping creation from Wikipedia dump, by also using the rendered HTML to avoid the template transclusion issue.

This dataset extends DBpedia with more than 3.9 million triples and 949.000 measurements on every domain covered by DBpedia. We define a multi-level approach powered by a formal grammar that proves very robust on the extraction of measurement. An extensive evaluation against DBpedia and Wikidata shows that our approach largely surpasses its competitors for measurement extraction on Wikipedia Infoboxes. Chaudron exhibits a F1-score of .89 while DBpedia and Wikidata respectively reach 0.38 and 0.10 on this extraction task.

Keywords

Wikipedia Extraction DBpedia Measurement RDF Formal grammar 

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.Univ Lyon, UJM-Saint-Etienne, CNRS, Laboratoire Hubert Curien, UMR 5516Saint-EtienneFrance

Personalised recommendations