Abstract
Matching Dependencies (MDs) are a recent proposal for declarative entity resolution. They are rules that specify, given the similarities satisfied by values in a database, what values should be considered duplicates, and have to be matched. On the basis of a chase-like procedure for MD enforcement, we can obtain clean (duplicate-free) instances; actually possibly several of them. The clean answers to queries (which we call the resolved answers) are invariant under the resulting class of instances. In this paper, we investigate a query rewriting approach to obtaining the resolved answers (for certain classes of queries and MDs). The rewritten queries are specified in stratified Datalognot,s with aggregation. In addition to the rewriting algorithm, we discuss the semantics of the rewritten queries, and how they could be implemented by means of a DBMS.
Research supported by the NSERC Strategic Network on Business Intelligence (BIN ADC05) and NSERC/IBM CRDPJ/371084-2008.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Abiteboul, S., Hull, R., Vianu, V.: Foundations of Databases.Addison-Wesley (1995)
Afrati, F., Kolaitis, P.: Repair checking in inconsistent databases: Algorithms and complexity. In: Proc. ICDT (2009)
Arasu, A., Ré, C., Suciu, D.: Large-scale deduplication with constraints using dedupalog. In: Proc. ICDE (2009)
Arenas, M., Bertossi, L., Chomicki, J.: Consistent query answers in inconsistent databases. In: Proc. PODS (1999)
Arenas, M., Bertossi, L., Chomicki, J.: Answer sets for consistent query answering in inconsistent databases. Theory and Practice of Logic Programming 3(4-5), 393–424 (2003)
Bahmani, Z., Bertossi, L., Kolahi, S., Lakshmanan, L.: Declarative Entity Resolution via Matching Dependencies and Answer Set Programs. In: Proc. KR 2012 (2012)
Barceló, P., Bertossi, L., Bravo, L.: Characterizing and Computing Semantically Correct Answers from Databases with Annotated Logic and Answer Sets. In: Bertossi, L., Katona, G.O.H., Schewe, K.-D., Thalheim, B. (eds.) Semantics in Databases. LNCS, vol. 2582, pp. 7–33. Springer, Heidelberg (2003)
Barcelo, P.: Logical foundations of relational data exchange. SIGMOD Record 38(1), 49–58 (2009)
Benjelloun, O., Garcia-Molina, H., Menestrina, D., Su, Q., Euijong Whang, S., Widom, J.: Swoosh: A generic approach to entity resolution. VLDB Journal 18(1), 255–276 (2009)
Bertossi, L.: From database repair programs to consistent query answering in classical logic. In: Proc. AMW. CEUR-WS, vol. 450 (2009)
Bertossi, L.: Database Repairing and Consistent Query Answering. Synthesis Lectures on Data Management. Morgan & Claypool (2011)
Bertossi, L., Bravo, L., Franconi, E., Lopatenko, A.: The complexity and approximation of fixing numerical attributes in databases under integrity constraints. Information Systems 33(4), 407–434 (2008)
Bertossi, L., Kolahi, S., Lakshmanan, L.: Data cleaning and query answering with matching dependencies and matching functions. In: Proc. ICDT (2011)
Bertossi, L., Kolahi, S., Lakshmanan, L.: Data cleaning and query answering with matching dependencies and matching functions. Theory of Computing Systems (2012), doi:10.1007/s00224-012-9402-7
Bleiholder, J., Naumann, F.: Data fusion. ACM Computing Surveys 41(1), 1–41 (2008)
Cali, A., Gottlob, G., Lukasiewicz, T., Pieris, A.: A logical toolbox for ontological reasoning. ACM Sigmod Record 40(3), 5–14 (2011)
Caniupan, M., Bertossi, L.: The consistency extractor system: answer set programs for consistent query answering in databases. Data & Know. Eng. 69(6), 545–572 (2010)
Chomicki, J., Marcinkowski, J.: Minimal-change integrity maintenance using tuple deletions. Information and Computation 197(1/2), 90–121 (2005)
Eiter, T., Fink, M., Greco, G., Lembo, D.: Repair localization for query answering from inconsistent databases. ACM Trans. Database Syst. 33(2) (2008)
Fan, W.: Dependencies revisited for improving data quality. In: Proc. PODS (2008)
Fan, W., Jia, X., Li, J., Ma, S.: Reasoning about record matching rules. In: Proc. VLDB (2009)
Flesca, S., Furfaro, F., Parisi, F.: Querying and repairing inconsistent numerical databases. ACM Trans. Database Syst. 35(2) (2010)
Franconi, E., Laureti Palma, A., Leone, N., Perri, S., Scarcello, F.: Census Data Repair: A Challenging Application of Disjunctive Logic Programming. In: Nieuwenhuis, R., Voronkov, A. (eds.) LPAR 2001. LNCS (LNAI), vol. 2250, pp. 561–578. Springer, Heidelberg (2001)
Fuxman, A., Miller, R.: First-order query rewriting for inconsistent databases. J. Computer and System Sciences 73(4), 610–635 (2007)
Gardezi, J., Bertossi, L., Kiringa, I.: Matching dependencies with arbitrary attribute values: semantics, query answering and integrity constraints. In: Proc. LID (2011)
Gardezi, J., Bertossi, L., Kiringa, I.: Matching dependencies: semantics, query answering and integrity constraints. Frontiers of Computer Science 6(3), 278–292 (2012)
Gardezi, J., Bertossi, L.: Query answering under matching dependencies for data cleaning: Complexity and algorithms, arXiv:1112.5908v1
Gardezi, J., Bertossi, L.: Query Rewriting using Datalog for Duplicate Resolution (extended version), http://people.scs.carleton.ca/~bertossi/papers/datalog22Long.pdf
Greco, G., Greco, S., Zumpano, E.: A logical framework for querying and repairing inconsistent databases. IEEE Trans. Knowledge and Data Eng. 15(6), 1389–1408 (2003)
Wijsen, J.: Database repairing using updates. ACM Trans. Database Systems 30(3), 722–768 (2005)
Wijsen, J.: On the first-order expressibility of computing certain answers to conjunctive queries over uncertain databases. In: Proc. PODS (2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Gardezi, J., Bertossi, L. (2012). Query Rewriting Using Datalog for Duplicate Resolution. In: Barceló, P., Pichler, R. (eds) Datalog in Academia and Industry. Datalog 2.0 2012. Lecture Notes in Computer Science, vol 7494. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32925-8_10
Download citation
DOI: https://doi.org/10.1007/978-3-642-32925-8_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-32924-1
Online ISBN: 978-3-642-32925-8
eBook Packages: Computer ScienceComputer Science (R0)