Abstract
In this work we establish and investigate connections between causes for query answers in databases, database repairs with respect to denial constraints, and consistency-based diagnosis. The first two are relatively new research areas in databases, and the third one is an established subject in knowledge representation. We show how to obtain database repairs from causes, and the other way around. Causality problems are formulated as diagnosis problems, and the diagnoses provide causes and their responsibilities. The vast body of research on database repairs can be applied to the newer problems of computing actual causes for query answers and their responsibilities. These connections are interesting per se. They also allow us, after a transition inspired by consistency-based diagnosis to computational problems on hitting-sets and vertex covers in hypergraphs, to obtain several new algorithmic and complexity results for database causality.
Similar content being viewed by others
Notes
In contrast with general causal claims, such as “smoking causes cancer”, which refer some sort of related events, actual causation specifies a particular instantiation of a causal relationship, e.g., “Joe’s smoking is a cause for his cancer”.
Although not in the context of repairs, consistency-based diagnosis has been applied to consistency restoration of a database with respect to integrity constraints [30].
As opposed to built-in predicates (e.g. ≠) that we assume do not appear, unless explicitly stated otherwise.
In this work, we will assume, unless otherwise explicitly said, that CQs may contain inequality atoms (equality atoms are not an issue, because they can always be eliminated).
In general, in the context of repairs, partitions on instances are not considered. However, in Section 7.3 we will bring them into the repair scene.
Here, and as usual, the atom Ab(c) expresses that component c is (behaving) abnormal(ly).
Cf. [4] for an example of the latter that uses key constraints, which are DCs with inequalities (with violation views that contain inequality).
For a precise formulation, see Definition 5.
Actually, [47] presents a PTIME algorithm for computing responsibilities for a restricted class of CQs.
If \(\phantom {\dot {i}\!}\mathcal {C}\) is a collection of non-empty subsets of a set S, a subset S ′⊆S is a hitting-set for \(\phantom {\dot {i}\!}\mathcal { C}\) if, for every \(\phantom {\dot {i}\!}C \in \mathcal { C}\), C∩S ′≠∅. S ′ is an S-minimal hitting-set if no proper subset of it is also a hitting-set. S is a minimum hitting-set if it has minimum cardinality.
The other direction is beyond the scope of this work. More importantly, logic-based diagnosis in general is a much richer scenario than that of database causality. In the former, we can have arbitrary logical specification, whereas under data causality, we have only monotone queries at hand.
Notice that these can also be seen as DCs, since they can be written as \(\phantom {\dot {i}\!}\forall \bar {x} \neg {Ab}_{P}(\bar {x})\).
Notice that these are not denial constraints.
In an hypergraph \(\phantom {\dot {i}\!}\mathcal {H}\), a set of vertices is a vertex cover if it intersects every hyperedge. A minimal vertex cover has no proper subset that is also a vertex cover. A minimum vertex cover has minimum cardinality among the vertex covers. Similarly, an independent set of \(\phantom {\dot {i}\!}\mathcal {H}\) is a set of vertices such that no pair of them is contained in a hyperedge. Maximal and maximum independent sets are defined in an obvious manner.
Or any other “abducible” predicates that are different from those in \(\phantom {\dot {i}\!}\mathcal {S}\).
This condition is clearly satisfied by the logical reconstruction of a relational database, but can be relaxed in several ways.
We could say that the efforts in [35, 36] to modify the Halpern-Pearl (HP) original definition of causality are about considering more appropriate restrictions on contingencies. Since in some cases the original HP definition does not provide intuitive results regarding causality, the modifications avoid this by recognizing some contingencies as “unreasonable” or “farfetched”.
We can say {t,t ′} is a conflict, i.e. the two tuples jointly participate in the violation of one of the DCs in Σ.
Pairs of conflicting tuples would inherit the priority relationships from the general priority relation.
Of course, we could use other optimality criteria at this points, but considering all possibilities is beyond the scope of this work.
An alternative, but equivalent formulation can be found in [8].
References
Afrati, F., Kolaitis, P.: Repair checking in inconsistent databases: Algorithms and complexity. Proc. ICDT, 31–41 (2009)
Arenas, M., Bertossi, L., Chomicki, J.: Consistent query answers in inconsistent databases. Proc. ACM PODS, 68–79 (1999)
Arenas, M., Bertossi, L., Chomicki, J.: Answer sets for consistent query answers. Theory Pract. Logic Programm. 3(4–5), 393–424 (2003)
Arenas, M., Bertossi, L., Chomicki, J., He, X., Raghavan, V., Spinrad, J.: Scalar aggregation in inconsistent databases. Theor. Comput. Sci. 296, 405–434 (2003)
Arieli, O., Denecker, M., Van Nuffelen, B., Bruynooghe, M.: Coherent integration of databases by abductive logic programming. J. Artif Intell. Res. 21, 245–286 (2004)
Barcelo, P., Bertossi, L., Bravo, L.: Characterizing and computing semantically correct answers from databases with annotated logic and answer sets. in semantics of databases. In: Semantics of Databases, Springer LNCS 2582, pp 1–27 (2003)
Bertossi, L.: Consistent query answering in databases. ACM SIGMOD Rec. 35(2), 68–76 (2006)
Bertossi, L., Li, L.: Achieving data privacy through secrecy views and null-based virtual updates. IEEE Trans Knowl. Data Eng. 25(5), 987–1000 (2013)
Bertossi, L.: Database repairing and consistent query answering, Morgan & Claypool, Synthesis Lectures on Data Management (2011)
Bertossi, L., Salimi, B.: Unifying causality, diagnosis, repairs and view-updates in databases. Presented at the First International Workshop on Big Uncertain Data (BUDA 2014). Posted at: arXiv:http://arXiv.org/abs/1405.4228 [cs.DB]
Brankovic, L., Fernau, H.H.: Parameterized approximation algorithms for hitting set. In: Approximation and Online Algorithms, pp 63–76. Springer LNCS 7164 (2012)
Bravo, L., Bertossi, L.: Semantically correct query answers in the presence of null values. In: Chomicki, J., Wijsen, J. (eds.) Proceedings EDBT WS on Inconsistency and Incompleteness in Databases (IIDB 06), pp 336–357. Springer LNCS 4254 (2006)
Buneman, P., Khanna, S., Tan, W.C.: Why and where: A characterization of data provenance. Proc. ICDT, 316–330 (2001)
Buneman, P., Tan, W.C.: Provenance in databases. Proc. ACM SIGMOD, 1171–1173 (2007)
Cheney, J., Chiticariu, L., Tan, W.C: Provenance in databases why, how, and where. Found. Trends Databases 1(4), 379–474 (2009)
Cheney, J., Chong, S., Foster, N., Seltzer, M.I., Vansummeren, S.: Provenance a future history. OOPSLA Companion (Onward!), 957–964 (2009)
Cheney, J.: Is Provenance Logical?. Proc. LID, 2–6 (2011)
Chomicki, J., Marcinkowski, J.: Minimal-change integrity maintenance using tuple deletions. Inf. Comput. 197(1-2), 90–121 (2005)
Chockler, H., Halpern, J.Y.: Responsibility and blame: a structural-model approach. J. Artif. Intell. Res. 22, 93–115 (2004)
Console, L., Torasso, P.: A spectrum of logical definitions of model-based diagnosis. Comput. Intell. 7, 133–141 (1991)
Console, L., Sapino, M.L., Theseider-Dupre, D.: The role of abduction in database view updating. J. Intell. Inf. Syst. 4(3), 261–280 (1995)
Cui, Y., Widom, J., Wiener, J.L.: Tracing the lineage of view data in a warehousing environment. ACM Trans. Database Syst. 25(2), 179–227 (2000)
Eiter, T., Gottlob, G., Leone, N.: Abduction from logic programs semantics and complexity. Theor. Comput. Sci. 189(1-2), 129–177 (1997)
Eiter, T. h., Faber, W., Leone, N., Pfeifer, G.: The diagnosis frontend of the DLV system. AI Commun. 12(1-2), 99–111 (1999)
Fagin, R., Kimelfeld, B., Kolaitis, Ph.: Dichotomies in the complexity of preferred repairs. Proc. ACM PODS, 3–15 (2015)
Fernau, H.: Parameterized algorithmics for d-hitting set. Int. J. Comput. Math. 87(14), 3157–3174 (2010)
Feldman, A., Provan, G., Gemund, A.V.: Approximate model-based diagnosis using greedy stochastic search. J. Artif. Intell. Res. (JAIR) 87(14), 3157–3174 (2010)
Flum, J., Grohe, M.: Parameterized complexity theory. Texts in Theoretical Computer Science, Springer Verlag (2006)
Garey, M., Johnson, D.S.: Computers and intractability: a guide to the theory of NP-completenes. W. H. Freeman (1979)
Gertz, M.: Diagnosis and repair of constraint violations in database systems. PhD Thesis, Universität Hannover (1996)
Greco, S., Pijcke, F., Wijsen, J.: Certain query answering in partially consistent databases. PVLDB 7(5), 353–364 (2014)
Halpern, J., Pearl, J.: Causes and explanations: a structural-model approach: part 1. Proc. UAI, 194–202 (2001)
Halpern, J., Pearl, J.: Causes and explanations: a structural-model approach: part 1. British J. Philos. Sci. 56, 843–887 (2005)
Halperin, E.: Improved approximation algorithms for the vertex cover problem in graphs and hyper-graphs. Proceedings ACM-SIAM Symposium on Discrete Algorithms, 329–337 (2000)
Halpern, J.: Appropriate causal models and stability of causation. Proc. KR’14 (2014)
Halpern, J.: A modification of Halpern-Pearl definition of causality. Proc. IJCAI (2015)
Kakas A. C., Mancarella, P.: Database updates through abduction. Proc. VLDB, 650–661 (1990)
Karvounarakis, G., Green, T.J.: Semiring-annotated data queries and provenance? SIGMOD Rec. 41(3), 5–14 (2012)
Krentel, M.: The complexity of optimization problems. J. Comput. Syst. 36, 490–509 (1988)
Karvounarakis, G., Ives Z. G., Tannen, V.: Querying provenance. Proc. ACM SIGMOD, 951–962 (2010)
Kimelfeld, B.: A dichotomy in the complexity of deletion propagation with functional dependencies. Proc. ACM PODS (2012)
Kimelfeld, B., Vondrak, J., Williams, R.: Maximizing conjunctive views in deletion propagation. ACM Trans. Database Syst. 37(4), 24 (2012)
Lopatenko, A., Bertossi, L.: Complexity of consistent query answering in databases under cardinality-based and incremental repair semantics. Proc. ICDT, 2007, Springer LNCS 4353, pp. 179–193. Proofs of results are found in [44]
Lopatenko, A., Bertossi, L.: Complexity of consistent query answering in databases under cardinality-based and incremental repair semantics. Extended version of [43], including proofs. Posted at: arXiv:http://arXiv.org/abs/cs/1605.07159 [cs.DB]
Meliou, A., Gatterbauer, W., Suciu, D.: Reverse data management. PVLDB 4(12), 1490–1493 (2011)
Meliou, A., Gatterbauer, W., Suciu, D.: Bringing provenance to its full potential using causal reasoning. Proc. TaPP (2011)
Meliou, A., Gatterbauer, W., Moore K. F., Suciu, D.: The complexity of causality and responsibility for query answers and non-answers. Proc. VLDB, 34–41 (2010)
Meliou, A., Gatterbauer, W., Halpern, J.Y., Koch, C., Moore K. F., Suciu, D.: Causality in databases. IEEE Data Eng. Bull. 33(3), 59–67 (2010)
Mozetic, I., Holzbaur, C.: Controlling the complexity in model-based diagnosis. Ann. Math. Artif. Intell. 11(1-4), 297–314 (1994)
Niedermeier, R., Rossmanith, P.: An efficient fixed-parameter algorithm for 3-hitting set. J Discret. Algorithm. 1(1), 89–102 (2003)
Okun, M.: On approximation of the vertex cover problem in hypergraphs. Discret. Optim. 2(1), 101–111 (2005)
Papadimitriou, C.H.: Computational complexity. Addison-Wesley (1994)
Reiter, R.: A theory of diagnosis from first principles. Artif. Intell. 32(1), 57–95 (1987)
Reiter, R.: Towards a logical reconstruction of relational database theory. In: On Conceptual Modelling, pp 191–233. Springer (1984)
Salimi, B., Bertossi, L.: Causality in databases: the diagnosis and repair connections. Presented at The 15th International Workshop on Non-Monotonic Reasoning (NMR 2014). Posted at: arXiv:1404.6857[cs.DB]
Salimi, B., Bertossi, L.: Causes for query answers from databases, datalog abduction and view-updates: the presence of integrity constraints. Proc. FLAIRS, 2016. Posted as Corr arXiv:http://arXiv.org/abs/cs.DB/1602.06458
Salimi, B., Bertossi, L.: Query-answer causality in databases: abductive diagnosis and view-updates. In: Proceedings UAI Causal Inference Workshop, 2015. CEUR-WS Proceedings Vol-1504 (2015)
Salimi, B., Bertossi, L.: From causes for database queries to repairs and model-based diagnosis and back. In: Proceedings 18th International Conference on Database Theory (ICDT 2015)
Staworko, S., Chomicki, J., Marcinkowski, J.: Prioritized repairing and consistent query answering in relational databases. Ann. Math. Artif. Intell. 64(2-3), 209–246 (2012)
Struss, P.: Model-based problem solving. In: Handbook of Knowledge Representation, chap. 10. Elsevier (2008)
Tannen, V.: Provenance propagation in complex queries. In: Buneman Festschrift, 2013, Springer LNCS 8000, pp. 483–493
Acknowledgments
Research funded by NSERC Discovery, and the NSERC Strategic Network on Business Intelligence (BIN). Conversations with Alexandra Meliou during Leo Bertossi’s visit to U. of Washington in 2011 are much appreciated. He is also grateful to Dan Suciu and Wolfgang Gatterbauer for their hospitality. L. Bertossi is grateful to Benny Kimelfeld for stimulating conversations. Part of the research was developed by L. Bertossi during partial sabbatical stays at LogicBlox and The Center for Semantic Web Research (Chile). Their support is much appreciated. We appreciate the comments from the anonymous reviewers.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Bertossi, L., Salimi, B. From Causes for Database Queries to Repairs and Model-Based Diagnosis and Back. Theory Comput Syst 61, 191–232 (2017). https://doi.org/10.1007/s00224-016-9718-9
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00224-016-9718-9