Skip to main content

Query Rewriting Using Datalog for Duplicate Resolution

  • Conference paper
Datalog in Academia and Industry (Datalog 2.0 2012)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7494))

Included in the following conference series:

Abstract

Matching Dependencies (MDs) are a recent proposal for declarative entity resolution. They are rules that specify, given the similarities satisfied by values in a database, what values should be considered duplicates, and have to be matched. On the basis of a chase-like procedure for MD enforcement, we can obtain clean (duplicate-free) instances; actually possibly several of them. The clean answers to queries (which we call the resolved answers) are invariant under the resulting class of instances. In this paper, we investigate a query rewriting approach to obtaining the resolved answers (for certain classes of queries and MDs). The rewritten queries are specified in stratified Datalognot,s with aggregation. In addition to the rewriting algorithm, we discuss the semantics of the rewritten queries, and how they could be implemented by means of a DBMS.

Research supported by the NSERC Strategic Network on Business Intelligence (BIN ADC05) and NSERC/IBM CRDPJ/371084-2008.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 49.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abiteboul, S., Hull, R., Vianu, V.: Foundations of Databases.Addison-Wesley (1995)

    Google Scholar 

  2. Afrati, F., Kolaitis, P.: Repair checking in inconsistent databases: Algorithms and complexity. In: Proc. ICDT (2009)

    Google Scholar 

  3. Arasu, A., Ré, C., Suciu, D.: Large-scale deduplication with constraints using dedupalog. In: Proc. ICDE (2009)

    Google Scholar 

  4. Arenas, M., Bertossi, L., Chomicki, J.: Consistent query answers in inconsistent databases. In: Proc. PODS (1999)

    Google Scholar 

  5. Arenas, M., Bertossi, L., Chomicki, J.: Answer sets for consistent query answering in inconsistent databases. Theory and Practice of Logic Programming 3(4-5), 393–424 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  6. Bahmani, Z., Bertossi, L., Kolahi, S., Lakshmanan, L.: Declarative Entity Resolution via Matching Dependencies and Answer Set Programs. In: Proc. KR 2012 (2012)

    Google Scholar 

  7. Barceló, P., Bertossi, L., Bravo, L.: Characterizing and Computing Semantically Correct Answers from Databases with Annotated Logic and Answer Sets. In: Bertossi, L., Katona, G.O.H., Schewe, K.-D., Thalheim, B. (eds.) Semantics in Databases. LNCS, vol. 2582, pp. 7–33. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  8. Barcelo, P.: Logical foundations of relational data exchange. SIGMOD Record 38(1), 49–58 (2009)

    Article  Google Scholar 

  9. Benjelloun, O., Garcia-Molina, H., Menestrina, D., Su, Q., Euijong Whang, S., Widom, J.: Swoosh: A generic approach to entity resolution. VLDB Journal 18(1), 255–276 (2009)

    Article  Google Scholar 

  10. Bertossi, L.: From database repair programs to consistent query answering in classical logic. In: Proc. AMW. CEUR-WS, vol. 450 (2009)

    Google Scholar 

  11. Bertossi, L.: Database Repairing and Consistent Query Answering. Synthesis Lectures on Data Management. Morgan & Claypool (2011)

    Google Scholar 

  12. Bertossi, L., Bravo, L., Franconi, E., Lopatenko, A.: The complexity and approximation of fixing numerical attributes in databases under integrity constraints. Information Systems 33(4), 407–434 (2008)

    Article  Google Scholar 

  13. Bertossi, L., Kolahi, S., Lakshmanan, L.: Data cleaning and query answering with matching dependencies and matching functions. In: Proc. ICDT (2011)

    Google Scholar 

  14. Bertossi, L., Kolahi, S., Lakshmanan, L.: Data cleaning and query answering with matching dependencies and matching functions. Theory of Computing Systems (2012), doi:10.1007/s00224-012-9402-7

    Google Scholar 

  15. Bleiholder, J., Naumann, F.: Data fusion. ACM Computing Surveys 41(1), 1–41 (2008)

    Article  Google Scholar 

  16. Cali, A., Gottlob, G., Lukasiewicz, T., Pieris, A.: A logical toolbox for ontological reasoning. ACM Sigmod Record 40(3), 5–14 (2011)

    Article  Google Scholar 

  17. Caniupan, M., Bertossi, L.: The consistency extractor system: answer set programs for consistent query answering in databases. Data & Know. Eng. 69(6), 545–572 (2010)

    Article  Google Scholar 

  18. Chomicki, J., Marcinkowski, J.: Minimal-change integrity maintenance using tuple deletions. Information and Computation 197(1/2), 90–121 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  19. Eiter, T., Fink, M., Greco, G., Lembo, D.: Repair localization for query answering from inconsistent databases. ACM Trans. Database Syst. 33(2) (2008)

    Google Scholar 

  20. Fan, W.: Dependencies revisited for improving data quality. In: Proc. PODS (2008)

    Google Scholar 

  21. Fan, W., Jia, X., Li, J., Ma, S.: Reasoning about record matching rules. In: Proc. VLDB (2009)

    Google Scholar 

  22. Flesca, S., Furfaro, F., Parisi, F.: Querying and repairing inconsistent numerical databases. ACM Trans. Database Syst. 35(2) (2010)

    Google Scholar 

  23. Franconi, E., Laureti Palma, A., Leone, N., Perri, S., Scarcello, F.: Census Data Repair: A Challenging Application of Disjunctive Logic Programming. In: Nieuwenhuis, R., Voronkov, A. (eds.) LPAR 2001. LNCS (LNAI), vol. 2250, pp. 561–578. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  24. Fuxman, A., Miller, R.: First-order query rewriting for inconsistent databases. J. Computer and System Sciences 73(4), 610–635 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  25. Gardezi, J., Bertossi, L., Kiringa, I.: Matching dependencies with arbitrary attribute values: semantics, query answering and integrity constraints. In: Proc. LID (2011)

    Google Scholar 

  26. Gardezi, J., Bertossi, L., Kiringa, I.: Matching dependencies: semantics, query answering and integrity constraints. Frontiers of Computer Science 6(3), 278–292 (2012)

    MathSciNet  Google Scholar 

  27. Gardezi, J., Bertossi, L.: Query answering under matching dependencies for data cleaning: Complexity and algorithms, arXiv:1112.5908v1

    Google Scholar 

  28. Gardezi, J., Bertossi, L.: Query Rewriting using Datalog for Duplicate Resolution (extended version), http://people.scs.carleton.ca/~bertossi/papers/datalog22Long.pdf

  29. Greco, G., Greco, S., Zumpano, E.: A logical framework for querying and repairing inconsistent databases. IEEE Trans. Knowledge and Data Eng. 15(6), 1389–1408 (2003)

    Article  Google Scholar 

  30. Wijsen, J.: Database repairing using updates. ACM Trans. Database Systems 30(3), 722–768 (2005)

    Article  Google Scholar 

  31. Wijsen, J.: On the first-order expressibility of computing certain answers to conjunctive queries over uncertain databases. In: Proc. PODS (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Gardezi, J., Bertossi, L. (2012). Query Rewriting Using Datalog for Duplicate Resolution. In: Barceló, P., Pichler, R. (eds) Datalog in Academia and Industry. Datalog 2.0 2012. Lecture Notes in Computer Science, vol 7494. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32925-8_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-32925-8_10

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-32924-1

  • Online ISBN: 978-3-642-32925-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics