Advertisement

Fuzzy Annotation of Web Data Tables Driven by a Domain Ontology

  • Gaëlle Hignette
  • Patrice Buche
  • Juliette Dibie-Barthélemy
  • Ollivier Haemmerlé
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5554)

Abstract

We propose an automatic system for annotating accurately data tables extracted from the web. This system is designed to provide additional data to an existing querying system called MIEL, which relies on a common vocabulary used to query local relational databases. We will use the same vocabulary, translated into an OWL ontology, to annotate the tables. Our annotation system is unsupervised. It uses only the knowledge defined in the ontology to automatically annotate the entire content of tables, using an aggregation approach: first annotate cells, then columns, then relations between those columns. The annotations are fuzzy: instead of linking an element of the table with a precise concept of the ontology, the elements of the table are annotated with several concepts, associated with their relevance degree. Our annotation process has been validated experimentally on scientific domains (microbial risk in food, chemical risk in food) and a technical domain (aeronautics).

Keywords

Membership Degree Domain Ontology Numeric Type Annotation Method Chemical Risk 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Rahm, E., Bernstein, P.A.: A survey of approaches to automatic schema matching. VLDB J. 10(4), 334–350 (2001)CrossRefzbMATHGoogle Scholar
  2. 2.
    Buche, P., Dibie-Barthélemy, J., Hignette, G.: Flexible querying of fuzzy rdf annotations using fuzzy conceptual graphs. In: Eklund, P., Haemmerlé, O. (eds.) ICCS 2008. LNCS, vol. 5113, pp. 133–146. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  3. 3.
    Doan, A., Domingos, P., Halevy, A.Y.: Learning to match the schemas of data sources: A multistrategy approach. Machine Learning 50(3), 279–301 (2003)CrossRefzbMATHGoogle Scholar
  4. 4.
    Liu, Y., Bai, K., Mitra, P., Giles, C.L.: Tableseer: automatic table metadata extraction and searching in digital libraries. In: JCDL, pp. 91–100. ACM Press, New York (2007)CrossRefGoogle Scholar
  5. 5.
    Cafarella, M.J., Halevy, A.Y., Zhang, Y., Wang, D.Z., Wu, E.: Uncovering the relational web. In: WebDB (2008)Google Scholar
  6. 6.
    Cafarella, M.J., Halevy, A.Y., Wang, D.Z., Wu, E., Zhang, Y.: Webtables: exploring the power of tables on the web. PVLDB 1(1), 538–549 (2008)Google Scholar
  7. 7.
    Pivk, A., Cimiano, P., Sure, Y.: From tables to frames. In: McIlraith, S.A., Plexousakis, D., van Harmelen, F. (eds.) ISWC 2004. LNCS, vol. 3298, pp. 166–181. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  8. 8.
    Tenier, S., Toussaint, Y., Napoli, A., Polanco, X.: Instantiation of relations for semantic annotation. In: Int. Conf. on Web Intelligence, pp. 463–472 (2006)Google Scholar
  9. 9.
    Embley, D.W., Tao, C., Liddle, S.W.: Automatically extracting ontologically specified data from HTML tables of unknown structure. In: Spaccapietra, S., March, S.T., Kambayashi, Y. (eds.) ER 2002. LNCS, vol. 2503, pp. 322–337. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  10. 10.
    Noy, N., Rector, A., Hayes, P., Welty, C.: Defining n-ary relations on the semantic web. W3C working group note (2006), http://www.w3.org/TR/swbp-n-aryRelations
  11. 11.
    Hignette, G., Buche, P., Dibie-Barthélemy, J., Haemmerlé, O.: An ontology-driven annotation of data tables. In: WISE Workshops 2007. Web Data Integration and Management for Life Sciences., Nancy, France, pp. 29–40 (December 2007)Google Scholar
  12. 12.
    Van Rijsbergen, C.J.: Information Retrieval, 2nd edn. Dept. of Computer Science, University of Glasgow (1979)Google Scholar
  13. 13.
    Platt, J.C.: Fast training of support vector machines using sequential minimal optimization, pp. 185–208. MIT Press, Cambridge (1999)Google Scholar
  14. 14.
    Gagliardi, H., Haemmerlé, O., Pernelle, N., Saïs, F.: An automatic ontology-based approach to enrich tables semantically. In: AAAI Context and Ontologies Workshop (2005)Google Scholar
  15. 15.
    Zadeh, L.: Fuzzy sets. Information and control 8, 338–353 (1965)MathSciNetCrossRefzbMATHGoogle Scholar
  16. 16.
    Zadeh, L.: Fuzzy sets as a basis for a theory of possibility. Fuzzy Sets and Systems 1, 3–28 (1978)MathSciNetCrossRefzbMATHGoogle Scholar
  17. 17.
    Dubois, D., Prade, H.: The three semantics of fuzzy sets. Fuzzy Sets and Systems 90(2), 141–150 (1997)MathSciNetCrossRefzbMATHGoogle Scholar
  18. 18.
    Cliver, D.O., Hajmeer, M.N., Jay-Russell, M.: Foodborne infections and intoxications. School of Veterinary Medicine, University of California (2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Gaëlle Hignette
    • 1
  • Patrice Buche
    • 1
  • Juliette Dibie-Barthélemy
    • 1
  • Ollivier Haemmerlé
    • 1
  1. 1.INRA/AgroParisTech Unité Mét@riskUniversité de Toulouse le MirailParis Cedex 5France

Personalised recommendations