Chapter

The Semantic Web – ISWC 2013

Volume 8218 of the series Lecture Notes in Computer Science pp 703-719

Statistical Knowledge Patterns: Identifying Synonymous Relations in Large Linked Datasets

  • Ziqi ZhangAffiliated withCarnegie Mellon UniversityDepartment of Computer Science, University of Sheffield
  • , Anna Lisa GentileAffiliated withCarnegie Mellon UniversityDepartment of Computer Science, University of Sheffield
  • , Eva BlomqvistAffiliated withCarnegie Mellon UniversityDepartment of Computer and Information Science, Linköping University
  • , Isabelle AugensteinAffiliated withCarnegie Mellon UniversityDepartment of Computer Science, University of Sheffield
  • , Fabio CiravegnaAffiliated withCarnegie Mellon UniversityDepartment of Computer Science, University of Sheffield

* Final gross prices may vary according to local VAT.

Get Access

Abstract

The Web of Data is a rich common resource with billions of triples available in thousands of datasets and individual Web documents created by both expert and non-expert ontologists. A common problem is the imprecision in the use of vocabularies: annotators can misunderstand the semantics of a class or property or may not be able to find the right objects to annotate with. This decreases the quality of data and may eventually hamper its usability over large scale. This paper describes Statistical Knowledge Patterns (SKP) as a means to address this issue. SKPs encapsulate key information about ontology classes, including synonymous properties in (and across) datasets, and are automatically generated based on statistical data analysis. SKPs can be effectively used to automatically normalise data, and hence increase recall in querying. Both pattern extraction and pattern usage are completely automated. The main benefits of SKPs are that: (1) their structure allows for both accurate query expansion and restriction; (2) they are context dependent, hence they describe the usage and meaning of properties in the context of a particular class; and (3) they can be generated offline, hence the equivalence among relations can be used efficiently at run time.