Extending Functional Dependency to Detect Abnormal Data in RDF Graphs

Yu, Yang; Heflin, Jeff

doi:10.1007/978-3-642-25073-6_50

Yang Yu²⁴ &
Jeff Heflin²⁴

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7031))

Included in the following conference series:

International Semantic Web Conference

2660 Accesses
12 Citations

Abstract

Data quality issues arise in the Semantic Web because data is created by diverse people and/or automated tools. In particular, erroneous triples may occur due to factual errors in the original data source, the acquisition tools employed, misuse of ontologies, or errors in ontology alignment. We propose that the degree to which a triple deviates from similar triples can be an important heuristic for identifying errors. Inspired by functional dependency, which has shown promise in database data quality research, we introduce value-clustered graph functional dependency to detect abnormal data in RDF graphs. To better deal with Semantic Web data, this extends the concept of functional dependency on several aspects. First, there is the issue of scale, since we must consider the whole data schema instead of being restricted to one database relation. Second, it deals with multi-valued properties without explicit value correlations as specified as tuples in databases. Third, it uses clustering to consider classes of values. Focusing on these characteristics, we propose a number of heuristics and algorithms to efficiently discover the extended dependencies and use them to detect abnormal data. Experiments have shown that the system is efficient on multiple data sets and also detects many quality problems in real world data .

Download to read the full chapter text

Chapter PDF

FastAGEDs: Fast Approximate Graph Entity Dependency Discovery

Automatic weighted matching rectifying rule discovery for data repairing

Article 09 June 2020

Method for the Assessment of Semantic Accuracy Using Rules Identified by Conditional Functional Dependencies

Keywords

References

Bohannon, P., Fan, W., Flaster, M., Rastogi, R.: A cost-based model and effective heuristic for repairing constraints by value modification. In: SIGMOD 2005, pp. 143–154. ACM, New York (2005)
Google Scholar
Fan, W., Geerts, F., Jia, X., Kementsietsidis, A.: Conditional functional dependencies for capturing data inconsistencies. ACM Trans. Database Syst. 33, 6:1–6:48 (2008)
Google Scholar
Cong, G., Fan, W., Geerts, F., Jia, X., Ma, S.: Improving data quality: consistency and accuracy. In: VLDB 2007, pp. 315–326. VLDB Endowment (2007)
Google Scholar
Sabou, M., Fernandez, M., Motta, E.: Evaluating semantic relations by exploring ontologies on the Semantic Web, pp. 269–280 (2010)
Google Scholar
Fürber, C., Hepp, M.: Using SPARQL and SPIN for Data Quality Management on the Semantic Web. In: Abramowicz, W., Tolksdorf, R. (eds.) BIS 2010. LNBIP, vol. 47, pp. 35–46. Springer, Heidelberg (2010)
Chapter Google Scholar
Tao, J., Sirin, E., Bao, J., McGuinness, D.L.: Integrity constraints in OWL. In: Fox, M., Poole, D. (eds.) AAAI. AAAI Press (2010)
Google Scholar
Codd, E.F.: Relational completeness of data base sublanguages. In: Database Systems, pp. 65–98. Prentice-Hall (1972)
Google Scholar
Mannila, H., Räihä, K.J.: Algorithms for inferring functional dependencies from relations. Data Knowl. Eng. 12(1), 83–99 (1994)
Article MATH Google Scholar
Huhtala, Y., Krkkinen, J., Porkka, P., Toivonen, H.: Tane: An efficient algorithm for discovering functional and approximate dependencies. The Computer Journal 42(2), 100–111 (1999)
Article MATH Google Scholar
Lopes, S., Petit, J.M., Lakhal, L.: Efficient Discovery of Functional Dependencies and Armstrong Relations. In: Zaniolo, C., Grust, T., Scholl, M.H., Lockemann, P.C. (eds.) EDBT 2000. LNCS, vol. 1777, pp. 350–364. Springer, Heidelberg (2000)
Chapter Google Scholar
Beeri, C., Dowd, M., Fagin, R., Statman, R.: On the structure of Armstrong relations for functional dependencies. J. ACM 31, 30–46 (1984)
Article MathSciNet MATH Google Scholar
Levene, M., Poulovanssilis, A.: An object-oriented data model formalised through hypergraphs. Data Knowl. Eng. 6(3), 205–224 (1991)
Article Google Scholar
Weddell, G.E.: Reasoning about functional dependencies generalized for semantic data models. ACM Trans. Database Syst., 32–64 (1992)
Google Scholar
Li Lee, M., Ling, T.W., Low, W.L.: Designing functional dependencies for XML. In: Jensen, C.S., Jeffery, K., Pokorný, J., Šaltenis, S., Hwang, J., Böhm, K., Jarke, M. (eds.) EDBT 2002. LNCS, vol. 2287, pp. 124–141. Springer, Heidelberg (2002)
Chapter Google Scholar
Hartmann, S., Link, S., Kirchberg, M.: A subgraph-based approach towards functional dependencies for XML. In: Computer Science and Engineering: II. SCI, vol. IX, pp. 200–211. IIIS (2003)
Google Scholar
Brown, P.G., Hass, P.J.: Bhunt: automatic discovery of fuzzy algebraic constraints in relational data. In: VLDB 2003, pp. 668–679. VLDB Endowment (2003)
Google Scholar
Haas, P.J., Hueske, F., Markl, V.: Detecting attribute dependencies from query feedback. In: VLDB 2007, pp. 830–841. VLDB Endowment (2007)
Google Scholar
Paradies, M., Lemke, C., Plattner, H., Lehner, W., Sattler, K.U., Zeier, A., Krueger, J.: How to juggle columns: an entropy-based approach for table compression. In: IDEAS 2010, pp. 205–215. ACM, New York (2010)
Google Scholar
McCallum, A., Nigam, K., Ungar, L.H.: Efficient clustering of high-dimensional data sets with application to reference matching. In: KDD 2000, pp. 169–178. ACM, New York (2000)
Google Scholar
Tibshirani, R., Walther, G., Hastie, T.: Estimating the number of clusters in a data set via the gap statistic. Journal Of The Royal Statistical Society Series B 63(2), 411–423 (2001)
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Lehigh University, 19 Memorial Drive West, Bethlehem, PA, 18015, USA
Yang Yu & Jeff Heflin

Authors

Yang Yu
View author publications
You can also search for this author in PubMed Google Scholar
Jeff Heflin
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Computer Science Dept., VU University Amsterdam, De Boelelaan 1081, 1081 HV, Amsterdam, The Netherlands
Lora Aroyo
IBM Research, 10598, Yorktown Heights, NY, USA
Chris Welty
The Open University, Walton Hall, MK7 6AA, Milton Keynes, UK
Harith Alani
Google, USA
Jamie Taylor
University of Zurich, Binzmuehlestrasse 14, 8050, Zurich, Switzerland
Abraham Bernstein
Massachusetts Institute of Technology, 32 Vassar Street, 02139, Cambridge, MA, USA
Lalana Kagal
Stanford University, 94305, Stanford, CA, USA
Natasha Noy
Linköping University, 581 83, Linköping, Sweden
Eva Blomqvist

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yu, Y., Heflin, J. (2011). Extending Functional Dependency to Detect Abnormal Data in RDF Graphs. In: Aroyo, L., et al. The Semantic Web – ISWC 2011. ISWC 2011. Lecture Notes in Computer Science, vol 7031. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-25073-6_50

Download citation

DOI: https://doi.org/10.1007/978-3-642-25073-6_50
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-25072-9
Online ISBN: 978-3-642-25073-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Extending Functional Dependency to Detect Abnormal Data in RDF Graphs

Abstract

Chapter PDF

Similar content being viewed by others

FastAGEDs: Fast Approximate Graph Entity Dependency Discovery

Automatic weighted matching rectifying rule discovery for data repairing

Method for the Assessment of Semantic Accuracy Using Rules Identified by Conditional Functional Dependencies

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Extending Functional Dependency to Detect Abnormal Data in RDF Graphs

Abstract

Chapter PDF

Similar content being viewed by others

FastAGEDs: Fast Approximate Graph Entity Dependency Discovery

Automatic weighted matching rectifying rule discovery for data repairing

Method for the Assessment of Semantic Accuracy Using Rules Identified by Conditional Functional Dependencies

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation