Specifying and computing causes for query answers in databases via database repairs and repair-programs

Bertossi, Leopoldo

doi:10.1007/s10115-020-01516-6

Specifying and computing causes for query answers in databases via database repairs and repair-programs

Regular Paper
Published: 03 November 2020

Volume 63, pages 199–231, (2021)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Leopoldo Bertossi ORCID: orcid.org/0000-0002-1144-3179^1,2

202 Accesses
8 Citations
Explore all metrics

Abstract

There is a recently established correspondence between database tuples as causes for query answers in databases and tuple-based repairs of inconsistent databases with respect to denial constraints. In this work, answer-set programs that specify database repairs are used as a basis for solving computational and reasoning problems around causality in databases, including causal responsibility. Furthermore, causes are introduced also at the attribute level by appealing to an attribute-based repair semantics that uses null values. Corresponding repair-programs are introduced, and used as a basis for computation and reasoning about attribute-level causes. The answer-set programs are extended in order to capture causality under integrity constraints.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Design methods for the new database era: a systematic literature review

Article 27 June 2019

A Survey on Advancing the DBMS Query Optimizer: Cardinality Estimation, Cost Model, and Plan Enumeration

Article Open access 15 January 2021

Comparing Oracle and PostgreSQL, Performance and Optimization

Notes

We appeal to an SQL semantics for null values, but this is not crucial. What matters is that a null value cannot be used to satisfy a join or make true a comparison that uses a built-in predicate. Any other semantics that is compatible with these assumptions could be used instead.
This version contains a reformulated definition and presentation of null-based repairs, of attribute-based causes, several examples with programs in DVL, and the treatment of causes under integrity constraints.
The variables in the atoms do not have to occur in the indicated order, but their positions should be in correspondence in the two atoms.
That is, satisfying reflexivity, transitivity and anti-symmetry, namely \(D_1 \preceq _D D_2 \text{ and } D_2 \preceq _D D_1 \ \Rightarrow D_1 = D_2\).
These general prioritized repairs based on this kind of priority relations were introduced in [37], where also different priority relations and the corresponding repairs were investigated.
Cf. also [3, Sects. 4 and 5] for an alternative repair-semantics based on both null- and tuple-based repairs w.r.t. general sets of ICs and their repair-programs. They could also be used to define a corresponding notion of cause.
Repairs based on updates of attribute values using other constants of the domain have been considered in [38]. We think the developments in this section could be applied to them.
The condition \(s_i \ne { null}\) in its definition is needed in case the initially given instance already contain nulls.
An alternative, but equivalent formulation can be found in [7].
Our setting allows for a uniform treatment of general and combined DCs, including those with (in)equality and other built-ins, FDs, and key constraints (KCs). However, for KCs in SQL databases, it is common that NULL is disallowed as a value for a key-attribute, among other issues. This prohibition, that we will ignore in this work, can be accommodated in our definition. For a detailed treatment of repairs w.r.t. sets of ICs that include FDs, see [8, Sects. 4 and 5].
It is possible to consider combinations of DCs and FDs, corresponding to UCQs, possibly with \(\ne \), by appealing to the extensions presented in [9].
As opposed to the skeptical or cautious semantics that sanctions as true what is true in all models. Both semantics as supported by the DLV system [31].
In contrast, hard program-constraints, of the form \(\leftarrow { Body}\), eliminate the models where they are violated, i.e. where Body is satisfied. WPCs as those above are sometimes denoted with \(\Leftarrow \ P'(t;{\bar{x}},\mathsf{d})\). If we used the hard program constraint \(\leftarrow \ P'(t;{\bar{x}},\mathsf{d})\) instead of the WPCs, we would be prohibiting tuple deletions. This would result in the empty set of models or just the original D in case the latter is consistent.
It would be easy to consider tids in queries and view definitions, but they do not contribute to the final result and will only complicate the notation. So, we skip tuple ids whenever possible.
Under null-based repairs no tuples are deleted or inserted, so the original tids stay all in the repairs and none is created.
If instead of (24) we had \(\kappa \!: \ \lnot \exists x \exists y \exists z( { Path}(x,y) \wedge { Near}(y,z) \wedge y<3)\), the new rule body could be \({ Path}'(t_1;x,y,{\mathbf {t}}),~ { Near}'(t_2;y,z,{\mathbf {t}}),y < 3\), because \({ null}< 3\) would be evaluated as false.
The proof of this claim is relatively straightforward, but rather long. It follows the same pattern as the proof that tuple-based repairs w.r.t. integrity constraints can be specified by means of disjunctive logic programs with stable model semantics [8, Proposition 6.1].
It is non-monotonic in that its violation view, which captures the tuples that violate it, is defined by a non-monotonic query. Monotonic ICs, i.e., for which a growing database may only produce more violations (e.g. denial constraints and FDs), are not much of an issue in this causality setting with conjunctive queries, because they stay satisfied under counterfactual deletions associated with causes. So here, the most relevant of the usual ICs are non-monotonic.
They should be safe in the sense that a variable in a negative literals has to appear in some positive literal too.

References

Abiteboul S, Hull R, Vianu V (1995) Foundations of databases. Addison-Wesley, Berlin
MATH Google Scholar
Arenas M, Bertossi L, Chomicki J (1999) Consistent query answers in inconsistent databases. In: Proceedings 1999 symposium on principles of database systems, pp 68–79
Arenas M, Bertossi L, Chomicki J (2003) Answer sets for consistent query answers. Theory Pract Log Program 3(4&5):393–424
Article MathSciNet Google Scholar
Barcelo P, Bertossi L, Bravo L (2003) Characterizing and computing semantically correct answers from databases with annotated logic and answer sets. In: Semantics in databases. LNCS 2582. Springer, Berlin, pp 7–33
Bertossi L, Bravo L, Franconi E et al (2008) The complexity and approximation of fixing numerical attributes in databases under integrity constraints. Inf Syst 33(4):407–434
Article Google Scholar
Bertossi L (2011) Database repairing and consistent query answering. In: Synthesis lectures on data management. Morgan & Claypool, London
Bertossi L, Li L (2013) Achieving data privacy through secrecy views and null-based virtual updates. IEEE Trans Knowl Data Eng 25(5):987–1000
Article Google Scholar
Bertossi L, Bravo L (2017) Consistency and trust in peer data exchange systems. Theory Pract Log Program 17(2):148–204 (Extended version as Corr Arxiv Paper arXiv:1606.01930 [cs.DB])
Article MathSciNet Google Scholar
Bertossi L, Salimi B (2017) From causes for database queries to repairs and model-based diagnosis and back. Theory Comput Syst 61(1):191–232
Article MathSciNet Google Scholar
Bertossi L, Salimi B (2017) Causes for query answers from databases: datalog abduction, view-updates, and integrity constraints. Int J Approx Reason 90:226–252
Article MathSciNet Google Scholar
Bertossi L (2018) Characterizing and computing causes for query answers in databases from database repairs and repair programs. In: Proceedings 2018 international symposium on foundations of information and knowledge systems, LNCS 10833. Springer, Berlin, pp 55–76
Bertossi L (2018) Characterizing causes for query answers in databases via database repairs and their computation through repair programs. Revised and extended version of (Bertossi, 2018), Corr Arxiv Paper arXiv:1712.01001 [cs.DB]
Bertossi L (2018) Measuring and computing database inconsistency via repairs. In: Proceedings 2018 scalable uncertainty management international conference, LNAI 11142. Springer, Berlin, pp 368–372
Bertossi L (2019) Repair-based degrees of database inconsistency: computation and complexity. In: Proceedings 2019 international conference on logic programming and non-monotonic reasoning, LNCS 11481. Springer, Berlin, pp 195–209
Brewka G, Eiter T, Truszczynski M (2011) Answer set programming at a glance. Commun ACM 54(12):93–103
Article Google Scholar
Buccafurri F, Leone N, Rullo P (2000) Enhancing disjunctive datalog by constraints. IEEE Trans Knowl Data Eng 12(5):845–860
Article Google Scholar
Calimeri F, Cozza S, Ianni G (2008) Computable functions in asp: theory and implementation. Proceedings 2008 international conference on logic programming, LNCS 5366. Springer, Berlin, pp 407–424
Calimeri F, Cozza S, Ianni G (2009) An asp system with functions, lists, and sets. In: Proceedings 2009 international conference on logic programming and non-monotonic reasoning, LNCS 5753. Springer, Berlin, pp 483–489
Caniupan-Marileo M, Bertossi L (2010) The consistency extractor system: answer set programs for consistent query answering in databases. Data Knowl Eng 69(6):545–572
Article Google Scholar
Chockler H, Halpern J (2004) Responsibility and blame: a structural-model approach. J Artif Intell Res 22:93–115
Article MathSciNet Google Scholar
Chomicki J, Marcinkowski J (2005) Minimal-change integrity maintenance using tuple deletions. Inf Comput 197(1–2):90–121
Article MathSciNet Google Scholar
Chou T, Winslett M (1994) A model-based belief revision system. J Autom Reason 12:157–208
Article MathSciNet Google Scholar
Dantsin E, Eiter T, Gottlob G et al (2001) Complexity and expressive power of logic programming. ACM Comput Surv 33(3):374–425
Article Google Scholar
Eiter T, Gottlob G, Mannila H (1997) Disjunctive datalog. ACM Trans Datab Syst 22(3):364–418
Article Google Scholar
Eiter T, Ianni G, Lukasiewicz T et al (2008) Combining answer set programming with description logics for the semantic web. Artif Intell 172(12–13):1495–1539
Article MathSciNet Google Scholar
Faber W, Pfeifer G, Leone N et al (2008) Design and implementation of aggregate functions in the DLV system. Theory Pract Log Program 8(5–6):545–580
Article MathSciNet Google Scholar
Gebser M, Kaminski R, Schaub T (2011) Complex optimization in answer set programming. Theory Pract Log Program 11(4–5):821–839
Article MathSciNet Google Scholar
Gebser M, Kaminski R, Kaufmann B et al (2012) Answer set solving in practice. In: Synthesis lectures on artificial intelligence and machine learning. Morgan & Claypool Publishers, London
Gelfond M, Kahl Y (2014) Knowledge representation and reasoning, and the design of intelligent agents. Cambridge University Press, Cambridge
Book Google Scholar
Halpern J, Pearl J (2005) Causes and explanations: a structural-model approach: part 1. Br J Philos Sci 56:843–887
Article Google Scholar
Leone N, Pfeifer G, Faber W et al (2006) The DLV system for knowledge representation and reasoning. ACM Trans Comput Log 7(3):499–562
Article MathSciNet Google Scholar
Lloyd J (1987) Foundations of logic programming. Springer, Berlin
Book Google Scholar
Lopatenko A, Bertossi L (2007) Complexity of consistent query answering in databases under cardinality-based and incremental repair semantics. In: Proceedings 2007 international conference on database theory. LNCS 4353. Springer, Berlin, pp 179–193
Marek V, Truszczynski M (1998) Revision programming. Theoret Comput Sci 190(2):241–277
Article MathSciNet Google Scholar
Meliou A, Gatterbauer W, Moore KF et al (2010) The complexity of causality and responsibility for query answers and non-answers. In: Proceedings 2010 very large data bases conference, pp 34–41
Salimi B, Bertossi L, Suciu D et al (2016) Quantifying causal effects on query answering in databases. In: Proceedings 2016 USENIX workshop on the theory and practice of provenance
Staworko S, Chomicki J, Marcinkowski J (2012) Prioritized repairing and consistent query answering in relational databases. Ann Math Artif Intel 64(2–3):209–246
Article MathSciNet Google Scholar
Wijsen J (2005) Database repairing using updates. ACM Trans Datab Syst 30(3):722–768
Article Google Scholar

Download references

Acknowledgements

Part of this work was done by the author as a Senior Computer Scientist at RelationalAI Inc., and also while the author was spending a sabbatical at the “Database and Artificial Intelligence” Group of the Technical University of Vienna with support from the “Vienna Center for Logic and Algorithms” and the Wolfgang Pauli Society. The author is extremely grateful for their support and hospitality, and especially to Prof. Georg Gottlob for making the stay possible. Many thanks to the anonymous reviewers [11] for their excellent feedback. The valuable help provided by Jordan Li with the examples in DLV is much appreciated. Part of this work was funded by ANID—Millennium Science Initiative Program—Code ICN17_002.

Author information

Authors and Affiliations

Faculty of Engineering and Sciences, Adolfo Ibáñez University, Santiago, Chile
Leopoldo Bertossi
Millennium Institute on Foundations of Data (IMFD, Chile), Santiago, Chile
Leopoldo Bertossi

Authors

Leopoldo Bertossi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Leopoldo Bertossi.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

A. Examples of tuple-based causes with DLV-complex

In this section, we show in detail how the examples and repairs-programs extended with causality elements of Sect. 5 can be specified and executed in the DLV-Complex system [17, 18].

Example A.1

(Example 2.3, 3.1, 5.1–5.3 cont.) The first fragment of the DLV program below, in its non-disjunctive version, shows facts for database D, and the three repair rules for the DC \(\kappa ({\mathcal {Q}})\). In it, and in the rest of this section, predicates +R+, +S+ stand for \({ Receives}\) and \({ Store}\), resp.; and \(\texttt {R}_\texttt {a}\), \(\texttt {S}_\texttt {a}\),...+ stand for \({ Receives}', { Store}',\ldots \) used before, with the subscript \(_\texttt {a}\) for “auxiliary.” We recall that the first attribute of a predicate holds a variable or a constant for a tid; and the last attribute of \(\texttt {R}_\texttt {a}\), etc. holds an annotation constant, +d+ or +s+, for “deleted” (from the database) or “stays” in a repair, resp.

This is the non-disjunctive version of the repair-program given in disjunctive form in Example 5.1, which in DLV takes the following form: (we will keep using the non-disjunctive versions of these programs)

DLV returns the stable models of the program, as follows:

These three stable models (that do not show here the original EDB) are associated to the S-repairs \(D_1, D_2, D_3\) in Example 3.1, resp. As expected from Example 2.3, only tuples with tids 1, 3, 4, 6 are at some point deleted. In particular, the last model corresponds to the C-repair \(D_1 = \{{ Receives}(s_4,s_3), { Receives}(s_2,s_1), { Receives}(s_3,s_3),\) \( { Store}(s_4), { Store}(s_2)\}\).

Now, to compute causes and their accompanying deleted tuples we add to the program the rules defining \({ Cause}\), in (14) and (15), and \({ CauCont}\), in Example 5.2:

Next, contingency sets can be computed by means of DLV-Complex, on the basis of the rules defining predicates \({ cause}\) and \({ cauCont}\) above:

The last two rules play the role of rules (16) and (17), that associate the empty contingency set to counterfactual causes.

The three stable models obtained above will now be extended with \({ cause}\)- and \({ cont}\)-atoms, among others (unless otherwise stated, \({ preCont}\)-, \({ tmpCont}\)-, and \({ HoleIn}\)-atoms will be filtered out from the output); as follows:

The first two models above show tuple 3 as an actual cause, with one contingency set per each of the models where it appears as a cause. The last line of the third model shows that cause (with tid) 6 is the only counterfactual cause (its contingency set is empty).

The responsibility \(\rho \) can be computed via predicate \({ preRho}(T,N)\), defined in (22), that returns \(N = \frac{1}{\rho }\), that is the inverse of the responsibility, for each tuple with tid T and local to a model that shows T as a cause. We concentrate on the computation of \({ preRho}\) in order to compute with integer numbers, as supported by DLV-Complex, which requires setting an upper integer bound by means of maxint, in this case, at least as large as the largest tid:

where the local (pre)responsibility of a cause (with tid) T within a repair is obtained by counting how many instances of \({ cauCont}(T,?)\) exist in the model, which is the size of the local contingency set for T plus 1. We obtain the following (filtered) output:

The first model shows causes 3 and 4 with a pre-rho value of 2. The second one, causes 3 and 1 with a pre-rho value of 2. The last model shows cause 6 with a pre-rho value of 1. This is also a maximum-responsibility cause, actually associated with a C-repair. Inspecting the three models, we can see that the overall pre-responsibility of cause 3 (the minimum of its pre-rho values) is 2, similarly for cause 1. For cause 6 the overall pre-responsibility value is 1.

Now, if we want only maximum-responsibility causes, we add weak program constraints to the program above, to minimize the number of deletions:

DLV shows only repairs with the least number of deletions, in this case:

As expected, only repair \(D_1\) is obtained, where only \({ Store}(6,s_3)\) is a cause, and with responsibility 1, making it a maximum-responsibility cause.

Example A.2

(Example 5.4 cont.) We proceed as in Example A.1, with the repair-program being: (again, in non-disjunctive form, but DLV accepts disjunction)

Now, we define the contingency sets and the local (pre)responsibilities for every cause in each model:

We obtain the following output, showing four S-repairs, with the last three being C-repairs:

Cause 5, for example, appears in the first and third repairs, which are an S-repair and C-repair, resp. If we do not want to start inspecting the kinds of repairs where a cause appears, and we haven’t pruned non-C-repairs, then we may pose a query of the form \({ preRho}(5,N)?\) against this program under. This is done by inserting the query at the end of the program (in file +file+), as +preRho(5,N)?+. Then in the command line, one types: +dlv -brave file+, obtaining as expected:

The least of the returned values will give us the global pre-responsibility value, which can be used to compute tuple 5’s (global) responsibility: \(\frac{1}{2}\). If we want all causes with their local pre-responsibilities, we pose instead the query: preRho(T,N)?.

If, as in Example A.1, we impose weak constraints to obtain only C-repairs:

we obtain:

As expected for C-repairs, the local pre-responsibilities for a cause coincide.

B. An example of null-based causes with DLV

Example B.1

(Example 4.2 and 4.6 cont.) In this DLV program S stands for \({ Store}\), and R, for \({ Receives}\). The tuples already contain tuple-ids. The program is similar to that in Example 6.2.

The query S(T1,X), R(T2,X,Y)? to the program under the brave semantics returns tuples of the form T1 X T2 Y, showing that tuples (with tids) 2,3,4,5 are responsible for the violation of the DC:

Two stable models are returned, corresponding to two attribute-based repairs:

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bertossi, L. Specifying and computing causes for query answers in databases via database repairs and repair-programs. Knowl Inf Syst 63, 199–231 (2021). https://doi.org/10.1007/s10115-020-01516-6

Download citation

Received: 02 March 2019
Revised: 29 September 2020
Accepted: 04 October 2020
Published: 03 November 2020
Issue Date: January 2021
DOI: https://doi.org/10.1007/s10115-020-01516-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Specifying and computing causes for query answers in databases via database repairs and repair-programs

Abstract

Access this article

Similar content being viewed by others

Design methods for the new database era: a systematic literature review

A Survey on Advancing the DBMS Query Optimizer: Cardinality Estimation, Cost Model, and Plan Enumeration

Comparing Oracle and PostgreSQL, Performance and Optimization

Notes

References

Acknowledgements