An Ontology-Based Method for Duplicate Detection in Web Data Tables

  • Patrice Buche
  • Juliette Dibie-Barthélemy
  • Rania Khefifi
  • Fatiha Saïs
Conference paper

DOI: 10.1007/978-3-642-23088-2_38

Part of the Lecture Notes in Computer Science book series (LNCS, volume 6860)
Cite this paper as:
Buche P., Dibie-Barthélemy J., Khefifi R., Saïs F. (2011) An Ontology-Based Method for Duplicate Detection in Web Data Tables. In: Hameurlain A., Liddle S.W., Schewe KD., Zhou X. (eds) Database and Expert Systems Applications. DEXA 2011. Lecture Notes in Computer Science, vol 6860. Springer, Berlin, Heidelberg

Abstract

We present, in this paper, a duplicate detection method in semantically annotated Web data tables, driven by a domain Termino-Ontological Resource (TOR). Our method relies on the fuzzy semantic annotations automatically associated with the Web data tables. A fuzzy semantic annotation is automatically associated with each row of a Web data table. It corresponds to the instantiation of a composed concept of the domain TOR, which represents the semantic n-ary relationship that exists between the columns of the Web data table. A fuzzy semantic annotation contains fuzzy values expressed as fuzzy sets. We propose an automatic duplicate detection method which consists in detecting the pairs of duplicate fuzzy semantic annotations and relies on (i) knowledge declared in the domain TOR and on (ii) similarity measures between fuzzy sets. Two new similarity measures are defined to compare both, the symbolic fuzzy values and the numerical fuzzy values. Our method has been tested on a real application in the domain of chemical risk in food.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Patrice Buche
    • 1
    • 2
  • Juliette Dibie-Barthélemy
    • 3
  • Rania Khefifi
    • 4
  • Fatiha Saïs
    • 4
  1. 1.INRA - UMR IATEMontpellier Cedex 2France
  2. 2.LIRMM, CNRS-UM2MontpellierFrance
  3. 3.INRA - Mét@risk & AgroParisTechParis Cedex 5France
  4. 4.LRI (CNRS & Paris-Sud 11 University)/INRIA SaclayOrsay CedexFrance

Personalised recommendations