Skip to main content

Specifying and computing causes for query answers in databases via database repairs and repair-programs

Abstract

There is a recently established correspondence between database tuples as causes for query answers in databases and tuple-based repairs of inconsistent databases with respect to denial constraints. In this work, answer-set programs that specify database repairs are used as a basis for solving computational and reasoning problems around causality in databases, including causal responsibility. Furthermore, causes are introduced also at the attribute level by appealing to an attribute-based repair semantics that uses null values. Corresponding repair-programs are introduced, and used as a basis for computation and reasoning about attribute-level causes. The answer-set programs are extended in order to capture causality under integrity constraints.

This is a preview of subscription content, access via your institution.

Fig. 1

Notes

  1. We appeal to an SQL semantics for null values, but this is not crucial. What matters is that a null value cannot be used to satisfy a join or make true a comparison that uses a built-in predicate. Any other semantics that is compatible with these assumptions could be used instead.

  2. This version contains a reformulated definition and presentation of null-based repairs, of attribute-based causes, several examples with programs in DVL, and the treatment of causes under integrity constraints.

  3. The variables in the atoms do not have to occur in the indicated order, but their positions should be in correspondence in the two atoms.

  4. That is, satisfying reflexivity, transitivity and anti-symmetry, namely \(D_1 \preceq _D D_2 \text{ and } D_2 \preceq _D D_1 \ \Rightarrow D_1 = D_2\).

  5. These general prioritized repairs based on this kind of priority relations were introduced in [37], where also different priority relations and the corresponding repairs were investigated.

  6. Cf. also [3, Sects. 4 and 5] for an alternative repair-semantics based on both null- and tuple-based repairs w.r.t. general sets of ICs and their repair-programs. They could also be used to define a corresponding notion of cause.

  7. Repairs based on updates of attribute values using other constants of the domain have been considered in [38]. We think the developments in this section could be applied to them.

  8. The condition \(s_i \ne { null}\) in its definition is needed in case the initially given instance already contain nulls.

  9. An alternative, but equivalent formulation can be found in [7].

  10. Our setting allows for a uniform treatment of general and combined DCs, including those with (in)equality and other built-ins, FDs, and key constraints (KCs). However, for KCs in SQL databases, it is common that NULL is disallowed as a value for a key-attribute, among other issues. This prohibition, that we will ignore in this work, can be accommodated in our definition. For a detailed treatment of repairs w.r.t. sets of ICs that include FDs, see [8, Sects. 4 and 5].

  11. It is possible to consider combinations of DCs and FDs, corresponding to UCQs, possibly with \(\ne \), by appealing to the extensions presented in [9].

  12. As opposed to the skeptical or cautious semantics that sanctions as true what is true in all models. Both semantics as supported by the DLV system [31].

  13. In contrast, hard program-constraints, of the form \(\leftarrow { Body}\), eliminate the models where they are violated, i.e. where Body is satisfied. WPCs as those above are sometimes denoted with \(\Leftarrow \ P'(t;{\bar{x}},\mathsf{d})\). If we used the hard program constraint \(\leftarrow \ P'(t;{\bar{x}},\mathsf{d})\) instead of the WPCs, we would be prohibiting tuple deletions. This would result in the empty set of models or just the original D in case the latter is consistent.

  14. It would be easy to consider tids in queries and view definitions, but they do not contribute to the final result and will only complicate the notation. So, we skip tuple ids whenever possible.

  15. Under null-based repairs no tuples are deleted or inserted, so the original tids stay all in the repairs and none is created.

  16. If instead of (24) we had \(\kappa \!: \ \lnot \exists x \exists y \exists z( { Path}(x,y) \wedge { Near}(y,z) \wedge y<3)\), the new rule body could be \({ Path}'(t_1;x,y,{\mathbf {t}}),~ { Near}'(t_2;y,z,{\mathbf {t}}),y < 3\), because \({ null}< 3\) would be evaluated as false.

  17. The proof of this claim is relatively straightforward, but rather long. It follows the same pattern as the proof that tuple-based repairs w.r.t. integrity constraints can be specified by means of disjunctive logic programs with stable model semantics [8, Proposition 6.1].

  18. It is non-monotonic in that its violation view, which captures the tuples that violate it, is defined by a non-monotonic query. Monotonic ICs, i.e., for which a growing database may only produce more violations (e.g. denial constraints and FDs), are not much of an issue in this causality setting with conjunctive queries, because they stay satisfied under counterfactual deletions associated with causes. So here, the most relevant of the usual ICs are non-monotonic.

  19. They should be safe in the sense that a variable in a negative literals has to appear in some positive literal too.

References

  1. Abiteboul S, Hull R, Vianu V (1995) Foundations of databases. Addison-Wesley, Berlin

    MATH  Google Scholar 

  2. Arenas M, Bertossi L, Chomicki J (1999) Consistent query answers in inconsistent databases. In: Proceedings 1999 symposium on principles of database systems, pp 68–79

  3. Arenas M, Bertossi L, Chomicki J (2003) Answer sets for consistent query answers. Theory Pract Log Program 3(4&5):393–424

    MathSciNet  Article  Google Scholar 

  4. Barcelo P, Bertossi L, Bravo L (2003) Characterizing and computing semantically correct answers from databases with annotated logic and answer sets. In: Semantics in databases. LNCS 2582. Springer, Berlin, pp 7–33

  5. Bertossi L, Bravo L, Franconi E et al (2008) The complexity and approximation of fixing numerical attributes in databases under integrity constraints. Inf Syst 33(4):407–434

    Article  Google Scholar 

  6. Bertossi L (2011) Database repairing and consistent query answering. In: Synthesis lectures on data management. Morgan & Claypool, London

  7. Bertossi L, Li L (2013) Achieving data privacy through secrecy views and null-based virtual updates. IEEE Trans Knowl Data Eng 25(5):987–1000

    Article  Google Scholar 

  8. Bertossi L, Bravo L (2017) Consistency and trust in peer data exchange systems. Theory Pract Log Program 17(2):148–204 (Extended version as Corr Arxiv Paper arXiv:1606.01930 [cs.DB])

    MathSciNet  Article  Google Scholar 

  9. Bertossi L, Salimi B (2017) From causes for database queries to repairs and model-based diagnosis and back. Theory Comput Syst 61(1):191–232

    MathSciNet  Article  Google Scholar 

  10. Bertossi L, Salimi B (2017) Causes for query answers from databases: datalog abduction, view-updates, and integrity constraints. Int J Approx Reason 90:226–252

    MathSciNet  Article  Google Scholar 

  11. Bertossi L (2018) Characterizing and computing causes for query answers in databases from database repairs and repair programs. In: Proceedings 2018 international symposium on foundations of information and knowledge systems, LNCS 10833. Springer, Berlin, pp 55–76

  12. Bertossi L (2018) Characterizing causes for query answers in databases via database repairs and their computation through repair programs. Revised and extended version of (Bertossi, 2018), Corr Arxiv Paper arXiv:1712.01001 [cs.DB]

  13. Bertossi L (2018) Measuring and computing database inconsistency via repairs. In: Proceedings 2018 scalable uncertainty management international conference, LNAI 11142. Springer, Berlin, pp 368–372

  14. Bertossi L (2019) Repair-based degrees of database inconsistency: computation and complexity. In: Proceedings 2019 international conference on logic programming and non-monotonic reasoning, LNCS 11481. Springer, Berlin, pp 195–209

  15. Brewka G, Eiter T, Truszczynski M (2011) Answer set programming at a glance. Commun ACM 54(12):93–103

    Article  Google Scholar 

  16. Buccafurri F, Leone N, Rullo P (2000) Enhancing disjunctive datalog by constraints. IEEE Trans Knowl Data Eng 12(5):845–860

    Article  Google Scholar 

  17. Calimeri F, Cozza S, Ianni G (2008) Computable functions in asp: theory and implementation. Proceedings 2008 international conference on logic programming, LNCS 5366. Springer, Berlin, pp 407–424

  18. Calimeri F, Cozza S, Ianni G (2009) An asp system with functions, lists, and sets. In: Proceedings 2009 international conference on logic programming and non-monotonic reasoning, LNCS 5753. Springer, Berlin, pp 483–489

  19. Caniupan-Marileo M, Bertossi L (2010) The consistency extractor system: answer set programs for consistent query answering in databases. Data Knowl Eng 69(6):545–572

    Article  Google Scholar 

  20. Chockler H, Halpern J (2004) Responsibility and blame: a structural-model approach. J Artif Intell Res 22:93–115

    MathSciNet  Article  Google Scholar 

  21. Chomicki J, Marcinkowski J (2005) Minimal-change integrity maintenance using tuple deletions. Inf Comput 197(1–2):90–121

    MathSciNet  Article  Google Scholar 

  22. Chou T, Winslett M (1994) A model-based belief revision system. J Autom Reason 12:157–208

    MathSciNet  Article  Google Scholar 

  23. Dantsin E, Eiter T, Gottlob G et al (2001) Complexity and expressive power of logic programming. ACM Comput Surv 33(3):374–425

    Article  Google Scholar 

  24. Eiter T, Gottlob G, Mannila H (1997) Disjunctive datalog. ACM Trans Datab Syst 22(3):364–418

    Article  Google Scholar 

  25. Eiter T, Ianni G, Lukasiewicz T et al (2008) Combining answer set programming with description logics for the semantic web. Artif Intell 172(12–13):1495–1539

    MathSciNet  Article  Google Scholar 

  26. Faber W, Pfeifer G, Leone N et al (2008) Design and implementation of aggregate functions in the DLV system. Theory Pract Log Program 8(5–6):545–580

    MathSciNet  Article  Google Scholar 

  27. Gebser M, Kaminski R, Schaub T (2011) Complex optimization in answer set programming. Theory Pract Log Program 11(4–5):821–839

    MathSciNet  Article  Google Scholar 

  28. Gebser M, Kaminski R, Kaufmann B et al (2012) Answer set solving in practice. In: Synthesis lectures on artificial intelligence and machine learning. Morgan & Claypool Publishers, London

  29. Gelfond M, Kahl Y (2014) Knowledge representation and reasoning, and the design of intelligent agents. Cambridge University Press, Cambridge

    Book  Google Scholar 

  30. Halpern J, Pearl J (2005) Causes and explanations: a structural-model approach: part 1. Br J Philos Sci 56:843–887

    Article  Google Scholar 

  31. Leone N, Pfeifer G, Faber W et al (2006) The DLV system for knowledge representation and reasoning. ACM Trans Comput Log 7(3):499–562

    MathSciNet  Article  Google Scholar 

  32. Lloyd J (1987) Foundations of logic programming. Springer, Berlin

    Book  Google Scholar 

  33. Lopatenko A, Bertossi L (2007) Complexity of consistent query answering in databases under cardinality-based and incremental repair semantics. In: Proceedings 2007 international conference on database theory. LNCS 4353. Springer, Berlin, pp 179–193

  34. Marek V, Truszczynski M (1998) Revision programming. Theoret Comput Sci 190(2):241–277

    MathSciNet  Article  Google Scholar 

  35. Meliou A, Gatterbauer W, Moore KF et al (2010) The complexity of causality and responsibility for query answers and non-answers. In: Proceedings 2010 very large data bases conference, pp 34–41

  36. Salimi B, Bertossi L, Suciu D et al (2016) Quantifying causal effects on query answering in databases. In: Proceedings 2016 USENIX workshop on the theory and practice of provenance

  37. Staworko S, Chomicki J, Marcinkowski J (2012) Prioritized repairing and consistent query answering in relational databases. Ann Math Artif Intel 64(2–3):209–246

    MathSciNet  Article  Google Scholar 

  38. Wijsen J (2005) Database repairing using updates. ACM Trans Datab Syst 30(3):722–768

    Article  Google Scholar 

Download references

Acknowledgements

Part of this work was done by the author as a Senior Computer Scientist at RelationalAI Inc., and also while the author was spending a sabbatical at the “Database and Artificial Intelligence” Group of the Technical University of Vienna with support from the “Vienna Center for Logic and Algorithms” and the Wolfgang Pauli Society. The author is extremely grateful for their support and hospitality, and especially to Prof. Georg Gottlob for making the stay possible. Many thanks to the anonymous reviewers [11] for their excellent feedback. The valuable help provided by Jordan Li with the examples in DLV is much appreciated. Part of this work was funded by ANID—Millennium Science Initiative Program—Code ICN17_002.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Leopoldo Bertossi.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

A. Examples of tuple-based causes with DLV-complex

In this section, we show in detail how the examples and repairs-programs extended with causality elements of Sect. 5 can be specified and executed in the DLV-Complex system [17, 18].

Example A.1

(Example 2.3, 3.15.15.3 cont.) The first fragment of the DLV program below, in its non-disjunctive version, shows facts for database D, and the three repair rules for the DC \(\kappa ({\mathcal {Q}})\). In it, and in the rest of this section, predicates +R+, +S+ stand for \({ Receives}\) and \({ Store}\), resp.; and \(\texttt {R}_\texttt {a}\), \(\texttt {S}_\texttt {a}\),...+ stand for \({ Receives}', { Store}',\ldots \) used before, with the subscript \(_\texttt {a}\) for “auxiliary.” We recall that the first attribute of a predicate holds a variable or a constant for a tid; and the last attribute of \(\texttt {R}_\texttt {a}\), etc. holds an annotation constant, +d+ or +s+, for “deleted” (from the database) or “stays” in a repair, resp.

figure b

This is the non-disjunctive version of the repair-program given in disjunctive form in Example 5.1, which in DLV takes the following form: (we will keep using the non-disjunctive versions of these programs)

figure c

DLV returns the stable models of the program, as follows:

figure d

These three stable models (that do not show here the original EDB) are associated to the S-repairs \(D_1, D_2, D_3\) in Example 3.1, resp. As expected from Example 2.3, only tuples with tids 1, 3, 4, 6 are at some point deleted. In particular, the last model corresponds to the C-repair     \(D_1 = \{{ Receives}(s_4,s_3), { Receives}(s_2,s_1), { Receives}(s_3,s_3),\) \( { Store}(s_4), { Store}(s_2)\}\).

Now, to compute causes and their accompanying deleted tuples we add to the program the rules defining \({ Cause}\), in (14) and (15), and \({ CauCont}\), in Example 5.2:

figure e

Next, contingency sets can be computed by means of DLV-Complex, on the basis of the rules defining predicates \({ cause}\) and \({ cauCont}\) above:

figure f

The last two rules play the role of rules (16) and (17), that associate the empty contingency set to counterfactual causes.

The three stable models obtained above will now be extended with \({ cause}\)- and \({ cont}\)-atoms, among others (unless otherwise stated, \({ preCont}\)-, \({ tmpCont}\)-, and \({ HoleIn}\)-atoms will be filtered out from the output); as follows:

figure g

The first two models above show tuple 3 as an actual cause, with one contingency set per each of the models where it appears as a cause. The last line of the third model shows that cause (with tid) 6 is the only counterfactual cause (its contingency set is empty).

The responsibility \(\rho \) can be computed via predicate \({ preRho}(T,N)\), defined in (22), that returns \(N = \frac{1}{\rho }\), that is the inverse of the responsibility, for each tuple with tid T and local to a model that shows T as a cause. We concentrate on the computation of \({ preRho}\) in order to compute with integer numbers, as supported by DLV-Complex, which requires setting an upper integer bound by means of maxint, in this case, at least as large as the largest tid:

figure h

where the local (pre)responsibility of a cause (with tid) T within a repair is obtained by counting how many instances of \({ cauCont}(T,?)\) exist in the model, which is the size of the local contingency set for T plus 1. We obtain the following (filtered) output:

figure i

The first model shows causes 3 and 4 with a pre-rho value of 2. The second one, causes 3 and 1 with a pre-rho value of 2. The last model shows cause 6 with a pre-rho value of 1. This is also a maximum-responsibility cause, actually associated with a C-repair. Inspecting the three models, we can see that the overall pre-responsibility of cause 3 (the minimum of its pre-rho values) is 2, similarly for cause 1. For cause 6 the overall pre-responsibility value is 1.

Now, if we want only maximum-responsibility causes, we add weak program constraints to the program above, to minimize the number of deletions:

figure j

DLV shows only repairs with the least number of deletions, in this case:

figure k

As expected, only repair \(D_1\) is obtained, where only \({ Store}(6,s_3)\) is a cause, and with responsibility 1, making it a maximum-responsibility cause.

Example A.2

(Example 5.4 cont.) We proceed as in Example A.1, with the repair-program being: (again, in non-disjunctive form, but DLV accepts disjunction)

figure l

Now, we define the contingency sets and the local (pre)responsibilities for every cause in each model:

figure m

We obtain the following output, showing four S-repairs, with the last three being C-repairs:

figure n

Cause 5, for example, appears in the first and third repairs, which are an S-repair and C-repair, resp. If we do not want to start inspecting the kinds of repairs where a cause appears, and we haven’t pruned non-C-repairs, then we may pose a query of the form \({ preRho}(5,N)?\) against this program under. This is done by inserting the query at the end of the program (in file +file+), as +preRho(5,N)?+. Then in the command line, one types: +dlv -brave file+, obtaining as expected:

figure o

The least of the returned values will give us the global pre-responsibility value, which can be used to compute tuple 5’s (global) responsibility: \(\frac{1}{2}\). If we want all causes with their local pre-responsibilities, we pose instead the query: preRho(T,N)?.

If, as in Example A.1, we impose weak constraints to obtain only C-repairs:

figure p

we obtain:

figure q

As expected for C-repairs, the local pre-responsibilities for a cause coincide.

B. An example of null-based causes with DLV

Example B.1

(Example 4.2 and 4.6 cont.) In this DLV program S stands for \({ Store}\), and R, for \({ Receives}\). The tuples already contain tuple-ids. The program is similar to that in Example 6.2.

figure r

The query S(T1,X), R(T2,X,Y)? to the program under the brave semantics returns tuples of the form T1 X T2 Y, showing that tuples (with tids) 2,3,4,5 are responsible for the violation of the DC:

figure s

Two stable models are returned, corresponding to two attribute-based repairs:

figure t

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Bertossi, L. Specifying and computing causes for query answers in databases via database repairs and repair-programs. Knowl Inf Syst 63, 199–231 (2021). https://doi.org/10.1007/s10115-020-01516-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-020-01516-6

Keywords

  • Causality
  • Databases
  • Repairs
  • Constraints
  • Answer-set programming