DIADEM: Domains to Databases

  • Tim Furche
  • Georg Gottlob
  • Christian Schallhart
Conference paper

DOI: 10.1007/978-3-642-32600-4_1

Part of the Lecture Notes in Computer Science book series (LNCS, volume 7446)
Cite this paper as:
Furche T., Gottlob G., Schallhart C. (2012) DIADEM: Domains to Databases. In: Liddle S.W., Schewe KD., Tjoa A.M., Zhou X. (eds) Database and Expert Systems Applications. DEXA 2012. Lecture Notes in Computer Science, vol 7446. Springer, Berlin, Heidelberg

Abstract

What if you could turn all websites of an entire domain into a single database? Imagine all real estate offers, all airline flights, or all your local restaurants’ menus automatically collected from hundreds or thousands of agencies, travel agencies, or restaurants, presented as a single homogeneous dataset.

Historically, this has required tremendous effort by the data providers and whoever is collecting the data: Vertical search engines aggregate offers through specific interfaces which provide suitably structured data. The semantic web vision replaces the specific interfaces with a single one, but still requires providers to publish structured data.

Attempts to turn human-oriented HTML interfaces back into their underlying databases have largely failed due to the variability of web sources. In this paper, we demonstrate that this is about to change: The availability of comprehensive entity recognition together with advances in ontology reasoning have made possible a new generation of knowledgedriven, domain-specific data extraction approaches. To that end, we introduce diadem, the first automated data extraction system that can turn nearly any website of a domain into structured data, working fully automatically, and present some preliminary evaluation results.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Tim Furche
    • 1
  • Georg Gottlob
    • 1
  • Christian Schallhart
    • 1
  1. 1.Department of Computer ScienceOxford UniversityOxfordUK

Personalised recommendations