Introduction

Confidential data are data meant to be protected from unauthorized accesses and disclosure. Examples include personal information, e.g., individual phone numbers, street addresses, biometric identifiers, or business data, such as trade secrets, company payment details, customer profiles, to mention a few. Confidentiality preservation is the task of maintaining data protected from breaches, i.e., situations in which confidential data are accessed without permission.

The actions that can be undertaken to prevent data breaches range from general infrastructure security measures, such as the use of firewalls or intrusion detection systems [33], to privileges management for access control, i.e., establishing who can access non-public data, for what purposes, and under which conditions [12], to data perturbation and filtering, such as data anonymization [2], encryption [4], or the use of (virtual) views [38] that show to a user only data that she is authorized to see.

Among the approaches based on some form of data filtering to preserve confidentiality, Controlled Query Evaluation (CQE) has the characteristic to be fully declarative and grounded on logic. In CQE, confidentiality requirements are expressed through a policy, i.e., a set of logical assertions that specify the information that must be kept secret, and a function, called censor, that is used to modify answers to queries so that confidential data cannot be inferred by the users on the basis of the answers they get and the knowledge they have.

Censors for confidentiality-preserving query answering have been first studied in [34] for complete propositional databases. The framework has then revived years later and extensively investigated for various forms of censors [5,6,7,8], even for the case of incomplete databases [9, 13]. More recently, CQE has been tackled in the context of Description Logic (DL) ontologies [14, 22, 23], where data are stored in the ontology component called ABox, whereas a second component, called TBox, captures intensional knowledge.

In this paper, we review some advancements we achieved in the last years in the latter field. More precisely, we recall the general framework for CQE given in [28] and mainly focus on two kinds of censors proposed in that paper, namely CQ censors and GA censors, recalling their crucial characteristics, previously shown in [17, 18, 20, 28]. In a nutshell, a CQ censor (resp. GA censor) is a function returning a subset of all Boolean conjunctive queries (resp. of all ground atoms, i.e., facts) that are inferred by the ontology and are compliant with the policy. The policy we consider is expressed as a set of denial assertions, each one specifying that the answer to a Boolean conjunctive query is confidential. Thus, a censor c is compliant with the policy if c perturbs query answering so that a user can never infer that a query in the policy is inferred by the ontology even though in fact it is. A censor is optimal if it returns a containment-maximal set among the sets returned by all possible censors (either CQ censors or GA censors). Typically, given an ontology and a policy, several optimal censors exist. Then, let \(\mathcal{S}\) be the set returned by a censor c, let \({\mathcal {T}}\) be the TBox of the ontology, and let \(q_u\) be a user query: the answer to \(q_u\) under censor c is the one inferred by \({\mathcal {T}}\cup \mathcal{S}\).

Our framework moves from the one proposed in [22, 23], but revises it by defining CQE as a form of skeptical reasoning over all optimal censors. More precisely, we define CQE as the problem of computing, for every user query \(q_u\), the answers to \(q_u\) that are in the intersection of the answers computed under all optimal censors. This notion of CQE distinguishes our proposal from previous work, which mainly focuses on the problem of constructing one optimal censor. The latter approach has often to make an arbitrary choice on the censor to use for protecting confidential data, whereas we avoid such a discretionary decision through reasoning over all optimal censors.

In our presentation, we start by CQ censors, which correspond to the confidentiality-preserving censors previously considered in [22, 23]. According to the terminology proposed in [14], such censors allow to realize a Simple Confidentiality Model (SCM), in which the aim is that entailed secrets are filtered out. As noted in [11, 14], an SCM is not always able to protect data from sophisticated attacks, where, e.g., the attacker possesses background knowledge and/or meta-knowledge that may help to reconstruct the confidential part of the ontology. Robustness with respect to these kinds of attacks may be achieved through censors enjoying the so-called indistinguishability property. Intuitively, a censor satisfies this property if it modifies answers to queries in such a way that a user can always conceive that the underlying ABox does not contain secrets (this hypothetical ABox is thus indistinguishable from the real one, which possibly contains secrets). As we have shown in [18, 20], CQ censors do not satisfy indistinguishability whereas GA censors do. These last results have strengthened our interest in GA censors, towards the identification of a setting allowing for both robust and practically realizable CQE. In this paper, we thus also review the characteristics of GA censors and discuss whether they allow us to achieve our aim.

Among such characteristics, computational complexity of reasoning turns out to be of paramount importance, because of its inherent connection to practical feasibility. In our studies, we have thus thoroughly investigated the data complexity of CQE [17, 18, 20, 28], i.e., the computational complexity established with respect to the size of the data only [36], which is the complexity measure mainly considered in data-intensive applications. In particular, we recall here the results we devised about the data complexity of answering conjunctive queries according to our revised notion of CQE, under both CQ censors and GA censors, for ontologies specified in the logic \(\text {DL-Lite} _{{\mathcal {R}}}\) [15]. We remark that standard conjunctive query answering in \(\text {DL-Lite} _{{\mathcal {R}}}\) ontologies is computationally tractable (i.e. solvable in polynomial time), specifically in AC\(^0\) in data complexity [15], that is the same data complexity of evaluating an SQL query over a database. By virtue of this characteristic, this logic is frequently adopted in applications in which large datasets are managed through an ontology [29], and has led to the definition of OWL 2 QL [31], the counterpart of \(\text {DL-Lite} _{{\mathcal {R}}}\) within OWL 2, the standard W3C ontology language [32]. As we have shown in [28], CQE for \(\text {DL-Lite} _{{\mathcal {R}}}\) under CQ censors is in PTIME in data complexity, whereas it is coNP-hard for GA censors (it is in AC\(^0\) only for queries that are ground atoms, i.e. for the so-called instance checking problem). To re-gain tractability, in [17] we have proposed, in the context of Ontology-based Data Access, a new notion of censor, called IGA censor, which is a well-founded sound approximation of GA censors. In this paper we thus recall this additional form of censor, which is particularly interesting because, as shown in [18], it enjoys indistinguishability, and at the same time guarantees tractable query answering, namely in AC\(^0\) in data complexity [17, 18].

We remark that the present paper is intended to be an introduction to the problem of CQE in DL ontologies, and in particular in \(\text {DL-Lite} _{{\mathcal {R}}}\), built through basic definitions, examples, and foundational results. To this aim, we gather together into this single article some material already appeared in slightly different shapes in various different papers [17, 18, 20, 28]. To complete our treatment, we also provide here some new results on the so-called ICQ censors, which are censors obtained by intersecting all possible optimal CQ censors, as IGA censors are obtained through the intersection of all optimal GA censors, and, as IGA censors, have the characteristic of being unique, given an ontology TBox and a policy.

We finally note that confidentiality-preservation in DLs and ontologies have been studied also through approaches different from CQE. For example, in [25] the authors propose a technique to import an ontology \({\mathcal {K}}_h\) into another ontology \({\mathcal {K}}_v\) and reason on \({\mathcal {K}}_h \cup {\mathcal {K}}_v\) using the axioms in \({\mathcal {K}}_v\) but just asking queries expressed over a subset of the signature of \({\mathcal {K}}_h\), being the rest of the signature of \({\mathcal {K}}_h\) non-accessible to users for confidentiality reasons. Authorization views in DLs are instead studied in [16, 35], whereas a probabilistic logic-based framework is considered in [21], and anonymization for Linked Data is addressed in [24]. Information disclosure in the context of Ontology-based Data Access is instead studied in [3].

The rest of the paper is organized as follows. In the next section, we provide some preliminaries. In the subsequent section, we deal with CQ and ICQ censors, followed by which, we attack the case of GA and IGA censors, in both sections providing complexity results on CQE for \(\text {DL-Lite} _{{\mathcal {R}}}\) ontologies and discussing the indistinguishability property for all these kinds of censor. Finally, we provide some final discussions and conclude the paper.

Preliminaries

We use standard notions of function-free first-order (FO) logic, and in particular, we consider Description Logics (DLs), which are fragments of FO logic that can be used to represent the domain of interest through individuals (also known as constants), representing real-world objects, atomic concepts, i.e., unary predicates representing sets of objects, and atomic roles, which denote binary relations between objects [1]. We assume to have the pairwise disjoint countably infinite sets \(\Sigma _{\mathcal {C}}\), \(\Sigma _{\mathcal {R}}\), \(\Sigma _{\mathcal {I}}\), and \(\Sigma _{\mathcal {V}}\) for atomic concepts, atomic roles, individuals, and variables, respectively.

Complex expressions of concepts and roles are specified by means of suitable operators applied to atomic concepts and atomic roles. Different DLs allow for different operators in the construction of complex expressions.

For a DL language \({\mathcal {L}}_{{\mathcal {T}}}\), an \({\mathcal {L}}_{{\mathcal {T}}}\) TBox \({\mathcal {T}}\) is a finite set of assertions allowed in the language \({\mathcal {L}}_{{\mathcal {T}}}\), adopting symbols from \(\Sigma _{\mathcal {C}}\cup \Sigma _{\mathcal {R}}\) as predicates and symbols from \(\Sigma _{\mathcal {I}}\cup \Sigma _{\mathcal {V}}\) as terms. The set of atomic concepts, roles, and individuals mentioned in the assertions of \({\mathcal {T}}\) constitutes the signature of \({\mathcal {T}}\). Given a TBox \({\mathcal {T}}\), an ABox \({\mathcal {A}}\) for \({\mathcal {T}}\) is a finite set of ground atoms (which we also refer to as facts) of the form A(a) and P(ab), where A and P are an atomic concept and an atomic role, respectively, occurring in the signature of \({\mathcal {T}}\), whereas a and b belong to \(\Sigma _{\mathcal {I}}\). In what follows, when a TBox \({\mathcal {T}}\) is given, whenever we refer to an ABox \({\mathcal {A}}\), we implicitly assume that \({\mathcal {A}}\) is for \({\mathcal {T}}\).

For a DL language \({\mathcal {L}}_{{\mathcal {T}}}\), an \({\mathcal {L}}_{{\mathcal {T}}}\) ontology is a finite theory \(\text{\O }={\mathcal {T}}\cup {\mathcal {A}}\) constituted by an \({\mathcal {L}}_{{\mathcal {T}}}\) TBox \({\mathcal {T}}\) and by an ABox \({\mathcal {A}}\). The TBox \({\mathcal {T}}\) specifies the intensional knowledge of a modeled domain, whereas the ABox \({\mathcal {A}}\) specifies the extensional knowledge.

The semantics of an ontology \(\text{\O }={\mathcal {T}}\cup {\mathcal {A}}\) is given in terms of FO interpretations [1] \({\mathcal {I}}=\langle \Delta ^{{\mathcal {I}}},\cdot ^{{\mathcal {I}}}\rangle\), where \(\Delta ^{{\mathcal {I}}}\) is the interpretation domain (i.e. a non-empty set of objects), and \(\cdot ^{{\mathcal {I}}}\) is the interpretation function, which assigns to each constant c occurring either in \({\mathcal {T}}\) or in \({\mathcal {A}}\) a domain object \(c^{{\mathcal {I}}} \in \Delta ^{{\mathcal {I}}}\), to each unary predicate A in the signature of \({\mathcal {T}}\) a subset \(A^{{\mathcal {I}}} \subseteq \Delta ^{{\mathcal {I}}}\), and to each binary predicate P in the signature of \({\mathcal {T}}\) a subset \(P^{{\mathcal {I}}} \subseteq \Delta ^{{\mathcal {I}}} \times \Delta ^{{\mathcal {I}}}\). An interpretation \({\mathcal {I}}\) is a model of an ontology \(\text{\O }\) if \({\mathcal {I}}\) satisfies all the assertions occurring in \(\text{\O }\). An ontology \(\text{\O }\) is said to be consistent if it has at least one model, inconsistent otherwise. Given an ontology \(\text{\O }\) and an FO sentence \(\phi\), we say that \(\text{\O }\) entails \(\phi\), denoted by \(\text{\O }\models \phi\), if \(\phi\) is true with respect to every model of \(\text{\O }\). If this is not the case, then we say that \(\text{\O }\) does not entail \(\phi\), denoted by \(\text{\O }\not \models \phi\).

In this paper, we are particularly interested in \(\text {DL-Lite} _{{\mathcal {R}}}\) ontologies, where \(\text {DL-Lite} _{{\mathcal {R}}}\) is the member of the \(\text {DL-Lite}\) family [15] underpinning \(\textsc {owl\,2\,ql}\) [30], i.e., the \(\textsc {owl\,2}\) profile specifically designed for efficient query answering.

Concepts and roles expressions in \(\text {DL-Lite} _{{\mathcal {R}}}\) are formed according to the following syntax:

$$\begin{aligned} \begin{array}{@{}r@{~}c@{~}ll@{~}c@{~}l} B &{}\longrightarrow &{} A ~\mid ~ \exists R \\ R &{}\longrightarrow &{} P ~\mid ~P^{-}, \end{array} \end{aligned}$$

where B is a basic concept, i.e., an expression of the form A, with \(A \in \Sigma _{\mathcal {C}}\), \(\exists P\), with \(P \in \Sigma _{\mathcal {R}}\), or \(\exists P^{-}\). The expressions \(\exists P\) and \(\exists P^{-}\) are called unqualified existential restrictions, which denote the set of objects occurring as first or second argument of P, respectively. R denotes a basic role, i.e., an expression of the form P or \(P^{-}\) (called the inverse of the atomic role P).

A \(\text {DL-Lite} _{{\mathcal {R}}}\) TBox \({\mathcal {T}}\) consists of a finite set of assertions of the following formFootnote 1:

$$\begin{aligned} \begin{array}{@{~}ll@{~}} B_1 \sqsubseteq B_2 &{} B_1 \sqsubseteq \lnot B_2\\ R_1 \sqsubseteq R_2 &{} R_1 \sqsubseteq \lnot R_2. \end{array} \end{aligned}$$

Assertions of the left-hand side are called positive inclusion assertions (or, simply, positive inclusions) specified, from the top to the bottom, between concepts and roles, respectively. Assertions of the right-hand side are called negative inclusion assertions (or, simply, negative inclusions), also called disjointnesses, specified, from the top to the bottom, between concepts and roles respectively.

Given an interpretation \({\mathcal {I}}=\langle \Delta ^{{\mathcal {I}}},\cdot ^{{\mathcal {I}}} \rangle\) of the form described above, the interpretation function \(\cdot ^{{\mathcal {I}}}\) extends to \(\text {DL-Lite} _{{\mathcal {R}}}\) basic concepts and roles as follows.

$$\begin{aligned}{}\begin{array}{r@{~~}c@{~~}l} ({\exists P})^{{\mathcal {I}}} &{}=&{} \{o\mid \exists o'\text{. }\, (o,o')\in P^{{\mathcal {I}}}\}\\ (P^{-})^{{\mathcal {I}}} &{}=&{} \{(o,o') \mid (o',o)\in P^{{\mathcal {I}}}\} \end{array} \end{aligned}$$

To complete the definition of the semantics of a \(\text {DL-Lite} _{{\mathcal {R}}}\) ontology, we define when an interpretation \({\mathcal {I}}\) satisfies TBox and ABox assertions:

  • \({\mathcal {I}}\) satisfies a TBox positive concept inclusion assertion \(B_1 \sqsubseteq B_2\) if \(B_1^{{\mathcal {I}}} \subseteq B_2^{{\mathcal {I}}}\);

  • \({\mathcal {I}}\) satisfies a TBox positive role inclusion assertion \(R_1 \sqsubseteq R_2\) if \(R_1^{{\mathcal {I}}} \subseteq R_2^{{\mathcal {I}}}\);

  • \({\mathcal {I}}\) satisfies a TBox negative concept inclusion assertion \(B_1 \sqsubseteq \lnot B_2\) if \(B_1^{{\mathcal {I}}} \cap B_2^{{\mathcal {I}}}=\emptyset\);

  • \({\mathcal {I}}\) satisfies a TBox negative role inclusion assertion \(R_1 \sqsubseteq \lnot R_2\) if \(R_1^{{\mathcal {I}}} \cap R_2^{{\mathcal {I}}}=\emptyset\);

  • \({\mathcal {I}}\) satisfies an ABox assertion A(a) if \(a^{{\mathcal {I}}} \in A^{{\mathcal {I}}}\);

  • \({\mathcal {I}}\) satisfies an ABox assertion P(ab) if \((a^{{\mathcal {I}}},b^{{\mathcal {I}}}) \in P^{{\mathcal {I}}}\).

As usual in query answering over DL ontologies, we focus on the language of conjunctive queries. A Boolean conjunctive query (BCQ) q is an FO sentence of the form \(\exists \vec{x} \text{. }\phi (\vec{x})\), where \(\vec{x}\) are variables in \(\Sigma _{\mathcal {V}}\), and \(\phi (\vec{x})\) is a finite, non-empty conjunction of atoms of the form \(\alpha (\textbf{t})\), where \(\alpha \in \Sigma _{\mathcal {C}}\cup \Sigma _{\mathcal {R}}\), and each term in \(\textbf{t}\) is either a constant in \(\Sigma _{\mathcal {I}}\) or a variable in \(\vec{x}\). A union Q of BCQs is a Boolean FO sentence of the form \(\bigvee _{i=1}^{n} q_i\), where \(q_i\) is a BCQ for each \(i=1,\ldots ,n\).

We recall that, for every \(\text {DL-Lite} _{{\mathcal {R}}}\) TBox \({\mathcal {T}}\) and union Q of BCQs, it is possible to effectively compute an FO query \(q_r\), called the perfect reformulation of Q with respect to \({\mathcal {T}}\), such that, for each ABox \({\mathcal {A}}\), we have \({\mathcal {T}}\cup {\mathcal {A}}\models Q\) if and only if \(q_r\) evaluates to true over the ABox \({\mathcal {A}}\) seen as an interpretation [15]. This yields the well-known result that answering unions of BCQs over \(\text {DL-Lite} _{{\mathcal {R}}}\) ontologies is FO-rewritable, and therefore the underlying decision problem is in \(\textsc {AC}^0\) in the size of the ABox, i.e., in the so-called data complexity [36]. All the complexity results in this paper concern with data complexity.

Given a TBox \({\mathcal {T}}\), a denial assertion (or simply denial) over \({\mathcal {T}}\) is an FO sentence of the form \(\forall \vec{x}. \phi (\vec{x}) \rightarrow \bot\) over the signature of \({\mathcal {T}}\), such that \(\exists \vec{x}. \phi (\vec{x})\) is a BCQ. Given a \(\text {DL-Lite} _{{\mathcal {R}}}\) ontology \(\text{\O }= {\mathcal {T}}\cup {\mathcal {A}}\) and a set of denials assertions \(\mathcal{P}\) over \({\mathcal {T}}\), we note that the theory \(\text{\O }\cup \mathcal{P}\) is consistent if and only if \(\text{\O }\not \models \exists \vec{x}. \phi (\vec{x})\) holds for each denial assertions \(\forall \vec{x}. \phi (\vec{x}) \rightarrow \bot\) occurring in \(\mathcal{P}\) [27].

In the following, with \(\textbf{CQ}\) and \(\textbf{GA}\) we denote the languages of BCQs and ground atoms, respectively, all specified over the alphabets \(\Sigma _{\mathcal {C}}, \Sigma _{\mathcal {R}}, \Sigma _{\mathcal {I}}\), and \(\Sigma _{\mathcal {V}}\). Note that \(\textbf{GA}\subseteq \textbf{CQ}\). Given an ontology \(\text{\O }={\mathcal {T}}\cup {\mathcal {A}}\) and a language \({\mathcal {L}}\subseteq \textbf{CQ}\), with \({\mathcal {L}}(\text{\O })\) we refer to the subset of \({\mathcal {L}}\) containing all those sentences constructible using the atomic concepts and atomic roles in the signature of \({\mathcal {T}}\) as predicates, and the constants occurring in \({\mathcal {T}}\cup {\mathcal {A}}\) and the variables in \(\Sigma _{\mathcal {V}}\) as terms. Given a language \({\mathcal {L}}\subseteq \textbf{CQ}\), a TBox \({\mathcal {T}}\), and an ABox \({\mathcal {A}}\), we denote by \(\textsf {cl}_{{\mathcal {L}}}^{{\mathcal {T}}}({\mathcal {A}})\) the set of those sentences in \({\mathcal {L}}({\mathcal {T}}\cup {\mathcal {A}})\) that are entailed by \({\mathcal {T}}\cup {\mathcal {A}}\), i.e., \(\textsf {cl}_{{\mathcal {L}}}^{{\mathcal {T}}}({\mathcal {A}})=\{\phi \mid \phi \in {\mathcal {L}}({\mathcal {T}}\cup {\mathcal {A}}) \text { and } {\mathcal {T}}\cup {\mathcal {A}}\models \phi \}\).

CQ Censors

A CQE specification is a pair \({\mathcal {J}}= \langle {\mathcal {T}},\mathcal{P} \rangle\), where \({\mathcal {T}}\) is a DL TBox and \(\mathcal{P}\) is a policy over \({\mathcal {T}}\), i.e., a finite set of denial assertions over \({\mathcal {T}}\). A CQE instance is a pair \({\mathcal {E}}= \langle {\mathcal {J}},{\mathcal {A}}\rangle\), where \({\mathcal {J}}= \langle {\mathcal {T}},\mathcal{P} \rangle\) is a CQE specification and \({\mathcal {A}}\) is an ABox for \({\mathcal {T}}\). We sometime say that \({\mathcal {A}}\) is an ABox for \({\mathcal {J}}\). In the following, when a TBox \({\mathcal {T}}\) is given, we always assume that the coupled policy is specified over \({\mathcal {T}}\), that each considered ABox \({\mathcal {A}}\) is over the signature of \({\mathcal {T}}\), and that, unless otherwise specified, \({\mathcal {T}}\cup {\mathcal {A}}\) and \({\mathcal {T}}\cup \mathcal{P}\) are consistent.

Example 1

Consider the CQE specification \({\mathcal {J}}=\langle {\mathcal {T}},\mathcal{P} \rangle\), where:

$$\begin{aligned} \begin{array}{rl} {\mathcal {T}}= \{&{} \textsf {Oncol}\sqsubseteq \exists \textsf {WorksIn},\\ &{} \exists \textsf {Cures}\sqsubseteq \exists \textsf {WorksIn},\\ &{} \exists \textsf {Cures}^- \sqsubseteq \textsf {Patient}\ \}\\ \mathcal{P} = \{&{} \textsf {Oncol}(x), \textsf {Cures}(x,y) \rightarrow \bot \ \} \end{array} \end{aligned}$$

In words, the \(\text {DL-Lite} _{{\mathcal {R}}}\) TBox \({\mathcal {T}}\) sanctions that (i) every oncologist (\(\textsf {Oncol}\)) works in (\(\textsf {WorksIn}\)) some place, and (ii) if somebody cures (\(\textsf {Cures}\)) someone else, the first individual must work somewhere and the latter is a patient (\(\textsf {Patient}\)). The data protection policy specified by \(\mathcal{P}\) hides the existence of patients cured by oncologists. \(\square\)

Definition 1

Let \({\mathcal {J}}= \langle {\mathcal {T}},\mathcal{P} \rangle\) be a CQE specification. A conjunctive query censor (CQ censor) \(\textsf{cens}(\cdot )\) for \({\mathcal {J}}\) is a function that, for each ABox \({\mathcal {A}}\), returns a possibly infinite set \(\textsf{cens}({\mathcal {A}})\) such that (i) \(\textsf{cens}({\mathcal {A}}) \subseteq \textsf {cl}_{\textbf{CQ}}^{{\mathcal {T}}}({\mathcal {A}})\) and (ii) \({\mathcal {T}}\cup \mathcal{P} \cup \textsf{cens}({\mathcal {A}})\) is consistent.

Example 2

Consider the CQE specification \({\mathcal {J}}\) of Example 1. The following functions are CQ censors for \({\mathcal {J}}\).

\(\textsf{cens}_1(\cdot ):\):

the function that, given an ABox \({\mathcal {A}}\), returns the set computed by removing from \(\textsf {cl}_{\textbf{CQ}}^{{\mathcal {T}}}({\mathcal {A}})\) all the CQs containing: (i) at least one conjunction \(\textsf {Oncol}(x) \wedge \textsf {Cures}(x,y)\) or (ii) at least one atom \(\textsf {Oncol}(a)\), with \(a \in \Sigma _{\mathcal {I}}\), s.t. \({\mathcal {T}}\cup {\mathcal {A}}\models \exists x.\textsf {Cures}(a,x)\).

\(\textsf{cens}_2(\cdot ):\):

the function that, given an ABox \({\mathcal {A}}\), returns the set computed by removing from \(\textsf {cl}_{\textbf{CQ}}^{{\mathcal {T}}}({\mathcal {A}})\) all the CQs containing: (i) at least one conjunction \(\textsf {Oncol}(x) \wedge \textsf {Cures}(x,y)\) or (ii) at least one atom \(\textsf {Cures}(a,x)\), with \((a,x) \in \Sigma _{\mathcal {I}}\times \Sigma _{\mathcal {V}}\), s.t. \({\mathcal {T}}\cup {\mathcal {A}}\models \textsf {Oncol}(a)\).

\(\textsf{cens}_3(\cdot ):\):

the function that, given an ABox \({\mathcal {A}}\), returns the set computed by removing from \(\textsf {cl}_{\textbf{CQ}}^{{\mathcal {T}}}({\mathcal {A}})\) all the CQs containing at least one atom \(\textsf {Cures}(t_1,t_2)\), with \(t_1,t_2 \in \Sigma _{\mathcal {I}}\cup \Sigma _{\mathcal {V}}\). \(\square\)

Given two CQ censors \(\textsf{cens}_1(\cdot )\) and \(\textsf{cens}_2(\cdot )\) for a CQE specification \({\mathcal {J}}=\langle {\mathcal {T}},\mathcal{P} \rangle\), we say that \(\textsf{cens}_2(\cdot )\) is more informative than \(\textsf{cens}_1(\cdot )\) if (i) \(\textsf{cens}_1({\mathcal {A}}) \subseteq \textsf{cens}_2({\mathcal {A}})\) holds for each ABox \({\mathcal {A}}\) and (ii) \(\textsf{cens}_1({\mathcal {A}}) \subset \textsf{cens}_2({\mathcal {A}})\) holds for at least an ABox \({\mathcal {A}}\).

Definition 2

Let \({\mathcal {J}}= \langle {\mathcal {T}},\mathcal{P} \rangle\) be a CQE specification and \(\textsf{cens}(\cdot )\) be a CQ censor for \({\mathcal {J}}\). We say that \(\textsf{cens}(\cdot )\) is an optimal CQ censor for \({\mathcal {J}}\) if there does not exist a CQ censor \(\textsf{cens}'(\cdot )\) for \({\mathcal {J}}\) that is more informative than \(\textsf{cens}(\cdot )\).

Example 3

Consider the CQE specification \({\mathcal {J}}\) of Example 1 and recall the CQ censors of Example 2. We have that \(\textsf{cens}_1(\cdot )\) and \(\textsf{cens}_2(\cdot )\) are optimal CQ censor for \({\mathcal {J}}\) while \(\textsf{cens}_3(\cdot )\) is not, since \(\textsf{cens}_2(\cdot )\) is more informative than \(\textsf{cens}_3(\cdot )\). Indeed, consider the ABox \({\mathcal {A}}= \{ \textsf {Oncol}(ann), \textsf {Cures}(ann,bob), \textsf {Cures}(sam,tom) \}\). One can verify that \(\textsf{cens}_3({\mathcal {A}}) \subset \textsf{cens}_2({\mathcal {A}})\). Moreover, from the definition of such functions, it is possible to see that \(\textsf{cens}_3({\mathcal {A}}') \subseteq \textsf{cens}_2({\mathcal {A}}')\) for any ABox \({\mathcal {A}}'\). Intuitively, \(\textsf{cens}_3(\cdot )\) hides also all the BCQs containing facts over the role \(\textsf {Cures}\) which could be safely disclosed, like \(\textsf {Cures}(sam,tom)\). \(\triangledown\)

As shown by Example 3, there may exist more than one optimal CQ censor for a given CQE specification \({\mathcal {J}}\). We denote by \(\textsf{optCQCens}({\mathcal {J}})\) the set of optimal CQ censors for \({\mathcal {J}}\).

An important property of optimal CQ censors is that they do not hide information implied by the pair ABox and TBox except in cases where the hiding is functional to not disclose policy-protected data. More formally, optimal CQ censors enjoy what we call knowledge preservation property, i.e. if \(\textsf{cens}(\cdot )\) is an optimal CQ censor for a CQE specification \({\mathcal {J}}=\langle {\mathcal {T}},\mathcal{P} \rangle\), then \(\textsf{cens}({\mathcal {A}}) = \textsf {cl}_{\textbf{CQ}}^{{\mathcal {T}}}({\mathcal {A}})\) holds for each ABox \({\mathcal {A}}\) such that \({\mathcal {T}}\cup \mathcal{P} \cup {\mathcal {A}}\) is consistent. It is easy to see that, in general, the same property is not satisfied by non-optimal CQ censors.

While in other works such as [22, 23] the study was focused on verifying the existence of an optimal censor and its computation, we follow a different approach, and define CQE as a form of skeptical reasoning over all the optimal censors of the underlying CQE specification. To this aim, we provide the definition of entailment under CQ censors.

Definition 3

Let \({\mathcal {E}}=\langle {\mathcal {J}},{\mathcal {A}}\rangle\) be a CQE instance, where \({\mathcal {J}}=\langle {\mathcal {T}},\mathcal{P} \rangle\), and \(\phi\) be an FO sentence. We say that \(\phi\) is CQ-Cens entailed by \({\mathcal {E}}\), denoted by \({\mathcal {E}}\models _{\text {CQ}}^{\text {cqe}}\phi\), if \({\mathcal {T}}\cup \textsf{cens}({\mathcal {A}}) \models \phi\) holds for every \(\textsf{cens}(\cdot ) \in \textsf{optCQCens}({\mathcal {J}})\).

Example 4

Consider the CQE instance \({\mathcal {E}}=\langle {\mathcal {J}},{\mathcal {A}}\rangle\), where \({\mathcal {J}}\) and \({\mathcal {A}}\) are as in Example 3. The following three queries are given:

$$\begin{aligned} \begin{array}{rl} q_1 =&{} \exists x.\textsf {WorksIn}(ann,x)\\ q_2 =&{} \exists x.\textsf {Cures}(x,bob)\\ q_3 =&{} \textsf {Cures}(ann,bob)\\ q_4 =&{} \textsf {Cures}(sam, tom). \end{array} \end{aligned}$$

One can verify that queries \(q_1\), \(q_2\), and \(q_4\) are CQ-Cens entailed by \({\mathcal {E}}\), while \(q_3\) is not. \(\square\)

We now provide the data complexity of the problem of entailment of BCQs in CQE with respect to CQ censors, focusing on \(\text {DL-Lite} _{{\mathcal {R}}}\) as ontology language.

Theorem 1

[ [28]] Let \({\mathcal {J}}=\langle {\mathcal {T}},\mathcal{P} \rangle\) be a CQE specification such that \({\mathcal {T}}\) is a \(\text {DL-Lite} _{{\mathcal {R}}}\) TBox, let \({\mathcal {A}}\) be an ABox, and let q be a BCQ. The problem of deciding whether \(\langle {\mathcal {J}},{\mathcal {A}}\rangle \models _{\text {CQ}}^{\text {cqe}}q\) is in PTIME in data complexity.

Inspired by the work on consistent query answering [26, 27], and towards the identification of a notion of censor that allow us to have a single optimal censor, we introduce below the definition of intersection-based CQ censor.

Definition 4

Let \({\mathcal {J}}= \langle {\mathcal {T}},\mathcal{P} \rangle\) be a CQE specification. An intersection-based CQ censor (ICQ censor) \(\textsf{cens}_{\cap }(\cdot )\) for \({\mathcal {J}}\) is a function that, given an ABox \({\mathcal {A}}\), returns a possibly infinite set \(\textsf{cens}_{\cap }({\mathcal {A}}) = \bigcap _{\textsf{cens}(\cdot ) \in \textsf{optCQCens}({\mathcal {J}})} \textsf{cens}({\mathcal {A}})\).

With the notion of ICQ censor in place, we can provide the following notion of entailment.

Definition 5

Let \({\mathcal {E}}=\langle {\mathcal {J}},{\mathcal {A}}\rangle\) be a CQE instance and \(\phi\) be an FO sentence. We say that \(\phi\) is ICQ-Cens entailed by \({\mathcal {E}}\), denoted by \(\langle {\mathcal {J}},{\mathcal {A}}\rangle \models _{\text {ICQ}}^{\text {cqe}}\phi\), if \({\mathcal {T}}\cup \textsf{cens}_{\cap }({\mathcal {A}}) \models \phi\) holds, where \(\textsf{cens}_{\cap }(\cdot )\) the ICQ censor for \({\mathcal {J}}\).

We now examine the data complexity of the problem of entailment of BCQs in CQE under ICQ censors, again by considering \(\text {DL-Lite} _{{\mathcal {R}}}\) as ontology language. First, we provide the following straightforward lemma, which is crucial to study the computational complexity of this problem.

Lemma 1

Let \({\mathcal {E}}=\langle {\mathcal {J}},{\mathcal {A}}\rangle\) be a CQE instance and q be a BCQ. We have that \(\langle {\mathcal {J}},{\mathcal {A}}\rangle \models _{\text {CQ}}^{\text {cqe}}q\) if and only if \(\langle {\mathcal {J}},{\mathcal {A}}\rangle \models _{\text {ICQ}}^{\text {cqe}}q\).

Proof

The fact that \(\langle {\mathcal {J}},{\mathcal {A}}\rangle \models _{\text {ICQ}}^{\text {cqe}}q\) implies \(\langle {\mathcal {J}},{\mathcal {A}}\rangle \models _{\text {CQ}}^{\text {cqe}}q\) is trivial.

Suppose now that \(\langle {\mathcal {J}},{\mathcal {A}}\rangle \models _{\text {CQ}}^{\text {cqe}}q\). By definition, we have that \({\mathcal {T}}\cup \textsf{cens}({\mathcal {A}}) \models q\) holds for every \(\textsf{cens}(\cdot ) \in \textsf{optCQCens}({\mathcal {J}})\). It immediately follows that \(q \in \textsf{cens}({\mathcal {A}})\) (otherwise \(\textsf{cens}(\cdot )\) would not be optimal) for every \(\textsf{cens}(\cdot ) \in \textsf{optCQCens}({\mathcal {J}})\), and thus \(q \in \textsf{cens}_{\cap }({\mathcal {A}})\), where \(\textsf{cens}_{\cap }\) is the ICQ censor for \({\mathcal {J}}\). We therefore conclude that \(\langle {\mathcal {J}},{\mathcal {A}}\rangle \models _{\text {ICQ}}^{\text {cqe}}q\). \(\square\)

From the above result and Theorem 1, we immediately have the following result.

Theorem 2

Let \({\mathcal {J}}=\langle {\mathcal {T}},\mathcal{P} \rangle\) be a CQE specification such that \({\mathcal {T}}\) is a \(\text {DL-Lite} _{{\mathcal {R}}}\) TBox, \({\mathcal {A}}\) be an ABox, and q be a BCQ. The problem of deciding whether \(\langle {\mathcal {J}},{\mathcal {A}}\rangle \models _{\text {ICQ}}^{\text {cqe}}q\) is in PTIME in data complexity.

It is interesting to note that, for CQE instances with a \(\text {DL-Lite} _{{\mathcal {R}}}\) TBox, both CQ censors and the ICQ censor return sets for which a representation as a finite set of BCQs does not always exist. This behavior implies that in general it is not possible to materialize a set of BCQs to be used for deciding query entailment under these kinds of censors.

Theorem 3

There exists a CQE specification \({\mathcal {J}}= \langle {\mathcal {T}},\mathcal{P} \rangle\), where \({\mathcal {T}}\) is a \(\text {DL-Lite} _{{\mathcal {R}}}\) TBox and \(\textsf{optCQCens}({\mathcal {J}}) = \{\textsf{cens}(\cdot )\}\), and an ABox \({\mathcal {A}}\) such that there does not exist any finite subset \(\mathcal{S}\) of \(\textsf{cens}({\mathcal {A}})\) satisfying the following: \({\mathcal {T}}\cup \mathcal{S} \models q\) if and only if \({\mathcal {T}}\cup \textsf{cens}({\mathcal {A}}) \models q\), for each BCQ q.

Proof

Consider the following \(\text {DL-Lite} _{{\mathcal {R}}}\) CQE specification \({\mathcal {J}}= \langle {\mathcal {T}},P\rangle\), where:

$$\begin{aligned} \begin{array}{rl} {\mathcal {T}}=&{} \{ R \sqsubseteq P, \exists R^- \sqsubseteq \exists R \}\\ \mathcal{P} =&{} \{ \forall x,y.R(x,y) \rightarrow \bot \}. \end{array} \end{aligned}$$

Is it easy to verify that \(\textsf{optCQCens}({\mathcal {J}})\) contains only the following optimal CQ censor:

\(\textsf{cens}(\cdot ):\):

given an ABox \({\mathcal {A}}\), \(\textsf{cens}_1({\mathcal {A}})\) returns the set computed by removing from \(\textsf {cl}_{\textbf{CQ}}^{{\mathcal {T}}}({\mathcal {A}})\) all the CQs containing at least one atom over the predicate R.

Now, consider the following ABox \({\mathcal {A}}= \{ R(a,b) \}\). One can verify that \(\textsf{cens}({\mathcal {A}})\) is semantically equivalent to the following infinite set of queries:

$$\begin{aligned} \begin{array}{rl} \{&{} P(a,b), \\ &{}\exists x,y.P(a,x) \wedge P(x,y),\\ &{}\exists x,y.P(x,b) \wedge P(b,y),\\ &{} \exists x,y,z.P(a,x) \wedge P(x,y) \wedge P(y,z),\\ &{} \exists x,y,z.P(x,b) \wedge P(b,y) \wedge P(y,z),\\ &{} \exists x,y,z,w.P(a,x) \wedge P(x,y) \wedge P(y,z) \wedge P(z,w),\\ &{} \exists x,y,z,w.P(x,b) \wedge P(b,y) \wedge P(y,z) \wedge P(z,w), \\ &{} \ldots \ \}. \end{array} \end{aligned}$$

It is easy to see that, for each natural number \(k \ge 1\), if we let \(\mathcal{S} _k\) be the set of all queries of length at most k (i.e., with at most k atoms in their body), then there exists a query \(q' \in \textsf{cens}({\mathcal {A}})\) of length \(k+1\) such that \(\mathcal{S} _k \not \models q'\). By construction, it easily follows that no finite subset \(\mathcal{S}\) of \(\textsf{cens}({\mathcal {A}})\) exits such that \({\mathcal {T}}\cup \mathcal{S} \models q\) if and only if \({\mathcal {T}}\cup \textsf{cens}({\mathcal {A}}) \models q\), for each BCQ q. \(\square\)

We further observe that, in the above proof, \(\textsf{cens}(\cdot )\) is the only optimal CQ censor for \({\mathcal {J}}\). Then it coincides with the ICQ censor for \({\mathcal {J}}\). Thus, the result holds also for the ICQ censor case.

Another important property of censors that is often addressed in the CQE literature is the so-called indistinguishability. The idea behind this property is that to preserve the confidentiality of a CQE instance \(\langle {\mathcal {J}},{\mathcal {A}}\rangle\), there must exist an ABox \({\mathcal {A}}'\) that contains no sensitive information and is indistinguishable from \({\mathcal {A}}\) in the eyes of the user. The formal definition is given below.

Definition 6

Let \({\mathcal {J}}= \langle {\mathcal {T}},\mathcal{P} \rangle\) be a CQE specification. A CQ censor \(\textsf{cens}(\cdot )\) for \({\mathcal {J}}\) satisfies the indistinguishability property if, for each ABox \({\mathcal {A}}\), there exists an ABox \({\mathcal {A}}'\) such that (i) \({\mathcal {T}}\cup \mathcal{P} \cup {\mathcal {A}}'\) is consistent and (ii) \(\textsf{cens}({\mathcal {A}}) = \textsf{cens}({\mathcal {A}}')\).

Note that, according to the Definition 1, an ICQ censor is in fact a CQ censor (even though not necessarily optimal), and thus the above definition can also be applied to ICQ censors.

Unfortunately, as already observed in [20], in general, CQ censors do not enjoy the indistinguishability property. This is illustrated by the following example.

Example 5

Consider the CQE specification \({\mathcal {J}}= \langle {\mathcal {T}},P\rangle\), where:

$$\begin{aligned} \begin{array}{rl} {\mathcal {T}}=&{} \emptyset \\ \mathcal{P} =&{} \{ \forall x.R(x,b) \rightarrow \bot \}. \end{array} \end{aligned}$$

Note that we have only one censor in \(\textsf{optCQCens}({\mathcal {J}})\), which is:

\(\textsf{cens}(\cdot ):\):

given an ABox \({\mathcal {A}}\), \(\textsf{cens}({\mathcal {A}})\) returns the set computed by removing from \(\textsf {cl}_{\textbf{CQ}}^{{\mathcal {T}}}({\mathcal {A}})\) all the CQs containing at least one atom over the predicate R having in the second position the constant b.

Consider the ABox \({\mathcal {A}}= \{ R(a,b) \}\). We have that \(\textsf{cens}({\mathcal {A}})\) is semantically equivalent to \(\{ \exists x.R(a,x) \}\). According to Definition 6, there must exists an ABox \({\mathcal {A}}'\) such that (i) \({\mathcal {T}}\cup \mathcal{P} \cup {\mathcal {A}}'\) is consistent and (ii) \(\textsf{cens}({\mathcal {A}}) = \textsf{cens}({\mathcal {A}}')\). Since the TBox is empty, the only way to entail the query \(\exists x.R(a,x)\) is to have in \({\mathcal {A}}'\) an atom of the form \(R(a,\alpha )\), where \(\alpha\) is a constant. If \(\alpha = b\), then \({\mathcal {T}}\cup \mathcal{P} \cup {\mathcal {A}}'\) is not consistent, while for every \(\alpha \ne b\) we have that \(R(a,\alpha ) \in \textsf{cens}({\mathcal {A}}')\) but \(R(a,\alpha ) \not \in \textsf{cens}({\mathcal {A}}')\). \(\triangledown\)

Moreover, since in the above example \(\textsf{optCQCens}({\mathcal {J}}) = \{\textsf{cens}(\cdot )\}\), we have that \(\textsf{cens}(\cdot )\) is the ICQ censor for \({\mathcal {J}}\). Thus, the above example also shows that ICQ censors are not guaranteed to enjoy the indistinguishability property.

GA Censors

As shown in the previous section, CQ censors do not enjoy some notable properties, such as having a finite representation or that of indistinguishability. In this section, we provide a new notion of censor which instead enjoys both of these properties.

Definition 7

Let \({\mathcal {J}}= \langle {\mathcal {T}},\mathcal{P} \rangle\) be a CQE specification. A ground atom censor (GA censor) \(\textsf{cens}(\cdot )\) for \({\mathcal {J}}\) is a function that, for each ABox \({\mathcal {A}}\), returns a set \(\textsf{cens}({\mathcal {A}})\) such that (i) \(\textsf{cens}({\mathcal {A}}) \subseteq \textsf {cl}_{\textbf{GA}}^{{\mathcal {T}}}({\mathcal {A}})\) and (ii) \({\mathcal {T}}\cup \mathcal{P} \cup \textsf{cens}({\mathcal {A}})\) is consistent.

Example 6

Consider the CQE specification \({\mathcal {J}}\) of Example 1. The following functions are GA censors for \({\mathcal {J}}\).

\(\textsf{cens}_4(\cdot ):\):

the function that, given an ABox \({\mathcal {A}}\), returns the set computed by removing from \(\textsf {cl}_{\textbf{GA}}^{{\mathcal {T}}}({\mathcal {A}})\) all the facts of the form \(\textsf {Cures}(a,b)\).

\(\textsf{cens}_5(\cdot ):\):

the function that, given an ABox \({\mathcal {A}}\), returns the set computed by removing from \(\textsf {cl}_{\textbf{GA}}^{{\mathcal {T}}}({\mathcal {A}})\) all the facts of the form \(\textsf {Cures}(a,b)\) such that \({\mathcal {T}}\cup {\mathcal {A}}\models \textsf {Oncol}(a)\).

\(\textsf{cens}_6(\cdot ):\):

the function that, given an ABox \({\mathcal {A}}\), returns the set computed by removing from \(\textsf {cl}_{\textbf{GA}}^{{\mathcal {T}}}({\mathcal {A}})\) all the facts of the form \(\textsf {Oncol}(a)\) such that \({\mathcal {T}}\cup {\mathcal {A}}\models \exists y.\textsf {Cures}(a,y)\). \(\square\)

As done for CQ censors, we define when a GA censor is more informative than another and when it is optimal.

Given two GA censors \(\textsf{cens}_1(\cdot )\) and \(\textsf{cens}_2(\cdot )\) for a CQE specification \({\mathcal {J}}=\langle {\mathcal {T}},\mathcal{P} \rangle\), we say that \(\textsf{cens}_2(\cdot )\) is more informative than \(\textsf{cens}_1(\cdot )\) if (i) \(\textsf{cens}_1({\mathcal {A}}) \subseteq \textsf{cens}_2({\mathcal {A}})\) holds for each ABox \({\mathcal {A}}\) and (ii) \(\textsf{cens}_1({\mathcal {A}}) \subset \textsf{cens}_2({\mathcal {A}})\) holds for at least an ABox \({\mathcal {A}}\). Furthermore, given a CQE specification \({\mathcal {J}}\) and a GA censor \(\textsf{cens}(\cdot )\) for \({\mathcal {J}}\), we say that \(\textsf{cens}(\cdot )\) is an optimal GA censor for \({\mathcal {J}}\) if there does not exist a GA censor \(\textsf{cens}'(\cdot )\) for \({\mathcal {J}}\) that is more informative than \(\textsf{cens}(\cdot )\). As for CQ censors, given a CQE specification \({\mathcal {J}}\), there may exist more than one optimal GA censor for \({\mathcal {J}}\). We denote by \(\textsf{optGACens}({\mathcal {J}})\) the set of all optimal GA censors for \({\mathcal {J}}\).

Example 7

Consider the CQE instance \({\mathcal {E}}=\langle {\mathcal {J}},{\mathcal {A}}\rangle\) of Example 4 and the GA censors of Example 6. While \(\textsf{cens}_5(\cdot )\) and \(\textsf{cens}_6(\cdot )\) are optimal GA censors for \({\mathcal {J}}\), it is easy to see that \(\textsf{cens}_4(\cdot )\) is not. Indeed, by applying such censors to \({\mathcal {A}}\) one would get the following sets of ground atoms:

$$\begin{aligned} \begin{array}{r@{}l} \textsf{cens}_4({\mathcal {A}}) = \{&{} \textsf {Oncol}(ann), \textsf {Patient}(bob), \textsf {Patient}(tom) \}\\ \textsf{cens}_5({\mathcal {A}}) = \{&{} \textsf {Oncol}(ann), \textsf {Patient}(bob), \textsf {Cures}(sam,tom), \textsf {Patient}(tom) \}\\ \textsf{cens}_6({\mathcal {A}}) = \{&{} \textsf {Cures}(ann,bob), \textsf {Patient}(bob), \textsf {Cures}(sam,tom), \textsf {Patient}(tom) \} \end{array} \end{aligned}$$

Clearly, we have that \(\textsf{cens}_4({\mathcal {A}}) \subset \textsf{cens}_5({\mathcal {A}})\). Moreover, from the definition of such functions, one can verify that \(\textsf{cens}_4({\mathcal {A}}') \subseteq \textsf{cens}_5({\mathcal {A}}')\) for any ABox \({\mathcal {A}}'\). \(\square\)

As in the case of optimal CQ censors, also optimal GA censors enjoy the knowledge preservation property, i.e.: if \(\textsf{cens}(\cdot )\) is an optimal GA censor for a CQE specification \({\mathcal {J}}=\langle {\mathcal {T}},\mathcal{P} \rangle\), then \(\textsf{cens}({\mathcal {A}}) = \textsf {cl}_{\textbf{GA}}^{{\mathcal {T}}}({\mathcal {A}})\) holds for each ABox \({\mathcal {A}}\) such that \({\mathcal {T}}\cup \mathcal{P} \cup {\mathcal {A}}\) is consistent. It is easy to see that, in general, the same property is not satisfied by non-optimal GA censors.

We also observe that since \(\textsf {cl}_{\textbf{GA}}^{{\mathcal {T}}}({\mathcal {A}})\) is a finite set of ground atoms, it follows from Definition 7 that so is the set returned by a GA censor. As such, it can be represented as an ABox. This means that, unlike CQ censors, there is always a finite representation of the application of a GA censor to an ABox. As shown in [20], such materialization can be computed in polynomial time with respect to the size of the ABox in input.

We now turn our attention to the indistinguishability property and we show that GA censors enjoy it. The definition of the indistinguishability property must first be slightly modified to account for GA censors.

Definition 8

Let \({\mathcal {J}}= \langle {\mathcal {T}},\mathcal{P} \rangle\) be a CQE specification. A GA censor \(\textsf{cens}(\cdot )\) for \({\mathcal {J}}\) satisfies the indistinguishability property if, for each ABox \({\mathcal {A}}\), there exists an ABox \({\mathcal {A}}'\) such that (i) \({\mathcal {T}}\cup \mathcal{P} \cup {\mathcal {A}}'\) and (ii) \(\textsf{cens}({\mathcal {A}}) = \textsf{cens}({\mathcal {A}}')\).

We are now ready to provide the following result.

Proposition 1

[[18]] Every optimal GA censor for a CQE specification \({\mathcal {J}}\) satisfies the indistinguishability property.

Below we define the entailment in CQE under GA censors.

Definition 9

Let \({\mathcal {E}}=\langle {\mathcal {J}},{\mathcal {A}}\rangle\) be a CQE instance and \(\phi\) be an FO sentence. We say that \(\phi\) is GA-Cens entailed by \({\mathcal {E}}\), denoted by \(\langle {\mathcal {J}},{\mathcal {A}}\rangle \models _{\text {GA}}^{\text {cqe}}\phi\), if \({\mathcal {T}}\cup \textsf{cens}({\mathcal {A}}) \models \phi\) holds for every \(\textsf{cens}(\cdot ) \in \textsf{optGACens}({\mathcal {J}})\).

Example 8

Consider the same CQE instance \({\mathcal {E}}=\langle {\mathcal {J}},{\mathcal {A}}\rangle\) and the same queries \(q_1\), \(q_2\), \(q_3\), and \(q_4\) of Example 4. One can verify that both \(q_1\) and \(q_4\) are GA-Cens entailed by \({\mathcal {E}}\), while both \(q_2\) and \(q_3\) are not. Note that, as shown in Example 4, the query \(q_2\) is instead CQ-Cens entailed by \({\mathcal {E}}\). \(\square\)

Since \(\textbf{CQ}\) is a more expressive language than \(\textbf{GA}\), one might wonder whether GA-Cens entailment is actually a sound approximation of CQ-Cens entailment. The following example shows that, in fact, the two entailment semantics are incomparable with each other.

Example 9

Let \({\mathcal {J}}= \langle {\mathcal {T}},\mathcal{P} \rangle\) be a CQE specification, with:

$$\begin{aligned} \begin{array}{rl} {\mathcal {T}}= \{ &{} A \sqsubseteq \exists P\ \}\\ \mathcal{P} = \{ &{}\forall x,y.B(x) \wedge P(x,y) \rightarrow \bot ,\\ &{}\forall x.A(x) \rightarrow \bot \} \end{array} \end{aligned}$$

Now, consider the ABox \({\mathcal {A}}= \{A(c), B(c)\}\). All optimal GA censors for \({\mathcal {J}}\), when applied to \({\mathcal {A}}\), will return the set \({\mathcal {C}}_1 = \{B(c)\}\). On the other hand, the optimal CQ censors will return one of the two following sets of BCQs:

$$\begin{aligned} \begin{array}{rlr} {\mathcal {C}}_2 =&{} \{\ B(c), \exists x,y.P(x,y), &{}\text {and all the queries in } \textbf{CQ}({\mathcal {T}}\cup \mathcal{P} )\text { inferred by them}\ \}\\ {\mathcal {C}}_3 =&{} \{\ \exists x.B(x), \exists y.P(c,y), &{}\text {and all the queries in }\textbf{CQ}({\mathcal {T}}\cup \mathcal{P} )\text { inferred by them}\ \} \end{array} \end{aligned}$$

Now, given the two BCQs \(q_1 = B(c)\) and \(q_2 = \exists x,y.P(x,y)\), we have that: \(\langle {\mathcal {J}},{\mathcal {A}}\rangle \models _{\text {GA}}^{\text {cqe}}q_1\) but \(\langle {\mathcal {J}},{\mathcal {A}}\rangle \not \models _{\text {CQ}}^{\text {cqe}}q_1\), while \(\langle {\mathcal {J}},{\mathcal {A}}\rangle \not \models _{\text {GA}}^{\text {cqe}}q_2\) but \(\langle {\mathcal {J}},{\mathcal {A}}\rangle \models _{\text {CQ}}^{\text {cqe}}q_2\). \(\square\)

This incomparability is also evidenced by the complexity results of the entailment problem of BCQs in CQE under GA censors showing that, if \(\text {DL-Lite} _{{\mathcal {R}}}\) is considered as ontology language, the problem is intractable in the case of GA censors.

Theorem 4

[ [28]] Let \({\mathcal {J}}=\langle {\mathcal {T}},\mathcal{P} \rangle\) be a CQE specification such that \({\mathcal {T}}\) is a \(\text {DL-Lite} _{{\mathcal {R}}}\) TBox, let \({\mathcal {A}}\) be an ABox, and let q be a BCQ. The problem of deciding whether \(\langle {\mathcal {J}},{\mathcal {A}}\rangle \models _{\text {GA}}^{\text {cqe}}q\) is coNP-complete in data complexity.

On the other hand, the following result shows that instance checking is tractable and, actually, in AC \(^0\) in data complexity.

Theorem 5

[ [28]] Let \({\mathcal {J}}=\langle {\mathcal {T}},\mathcal{P} \rangle\) be a CQE specification such that \({\mathcal {T}}\) is a \(\text {DL-Lite} _{{\mathcal {R}}}\) TBox, let \({\mathcal {A}}\) be an ABox, and let g be a ground atom. The problem of deciding whether \(\langle {\mathcal {J}},{\mathcal {A}}\rangle \models _{\text {GA}}^{\text {cqe}}g\) is in AC \(^0\) in data complexity.

As done in the previous section for CQ censors, we now consider an intersection-based censor also for the case of GA censors.

Definition 10

Let \({\mathcal {J}}= \langle {\mathcal {T}},\mathcal{P} \rangle\) be a CQE specification. An intersection-based GA censor (IGA censor) \(\textsf{cens}_{\cap }(\cdot )\) for \({\mathcal {J}}\) is a function that, given an ABox \({\mathcal {A}}\), returns the set \(\textsf{cens}_{\cap }({\mathcal {A}}) = \bigcap _{\textsf{cens}(\cdot ) \in \textsf{optGACens}({\mathcal {J}})} \textsf{cens}({\mathcal {A}})\).

Also in this case, for each CQE specification, there exists by definition only one IGA censor. We note that, given a CQE specification \({\mathcal {J}}\), the IGA censor for \({\mathcal {J}}\) is, according to Definition 7, also a GA censor for \({\mathcal {J}}\). Moreover, as stated in [18], IGA censors enjoy the indistinguishability property.

We now examine the data complexity of the problem of entailment of BCQs in CQE under IGA censors, again by considering \(\text {DL-Lite} _{{\mathcal {R}}}\) as ontology language. We first give the definition of entailment, which, as done in analogous definitions of entailment given before, is given with respect to a generic ontology language for the CQE specification.

Definition 11

Let \({\mathcal {E}}=\langle {\mathcal {J}},{\mathcal {A}}\rangle\) be a CQE instance and \(\phi\) be an FO sentence. We say that \(\phi\) is IGA-Cens entailed by \({\mathcal {E}}\), denoted by \(\langle {\mathcal {J}},{\mathcal {A}}\rangle \models _{\text {IGA}}^{\text {cqe}}\phi\), if \(\textsf{cens}_{\cap }({\mathcal {A}}) \models \phi\) holds, where \(\textsf{cens}_{\cap }(\cdot )\) the IGA censor for \({\mathcal {J}}\).

It is straightforward to verify that IGA-Cens entailment is a sound approximation of the GA-Cens entailment.

Proposition 2

[ [17]] Let \({\mathcal {J}}\) be a CQE specification, \({\mathcal {A}}\) be an ABox, and \(\phi\) be a FOL sentence. If \(\langle {\mathcal {J}},{\mathcal {A}}\rangle \models _{\text {IGA}}^{\text {cqe}}\phi\), then \(\langle {\mathcal {J}},{\mathcal {A}}\rangle \models _{\text {GA}}^{\text {cqe}}\phi\).

However, differently from the case of CQ censors, entailment under GA censors does not coincide in general with entailment under the IGA censor. The following example shows, indeed, that the converse of Proposition 2 does not necessarily hold.

Example 10

Let \({\mathcal {J}}\) and \({\mathcal {A}}\) be as in Example 7. The IGA censor \(\textsf{cens}_{\cap }(\cdot )\) for \({\mathcal {J}}\) is such that \(\textsf{cens}_{\cap }({\mathcal {A}}) = \{\ \textsf {Patient}(bob), \textsf {Cures}(sam,tom), \textsf {Patient}(tom)\ \}\).

Consider now the queries of Example 8. While both \(q_1\) and \(q_4\) are GA-Cens entailed by \(\langle {\mathcal {J}},{\mathcal {A}}\rangle\), it is immediate to see that \(q_4\) is the only query IGA-Entailed by \(\langle {\mathcal {J}},{\mathcal {A}}\rangle\). \(\square\)

We conclude this section by providing the data complexity of IGA-Cens entailment of BCQs.

Theorem 6

[ [17, 18]] Let \({\mathcal {J}}=\langle {\mathcal {T}},\mathcal{P} \rangle\) be a CQE specification such that \({\mathcal {T}}\) is a \(\text {DL-Lite} _{{\mathcal {R}}}\) TBox, let \({\mathcal {A}}\) be an ABox, and let q be a BCQ. The problem of deciding whether \(\langle {\mathcal {J}},{\mathcal {A}}\rangle \models _{\text {IGA}}^{\text {cqe}}q\) is in AC \(^0\) in data complexity.

As shown by Theorem 6, with respect to GA-Cens entailment, the adoption of the IGA censor allows for a significant improvement in data complexity of query answering, still preserving the fact that we are adopting a kind of censor that enjoys the indistinguishability property.

Conclusions

In this paper, we provided an introduction to Controlled Query Evaluation in the context of Description Logics ontologies, by presenting a series of results from our previous investigations. We also introduced the new notion of the ICQ censor and studied its relationship with CQ censors. We have therefore considered four different CQE semantics, each of which is based on a different notion of censor, also highlighting some of their fundamental properties, namely, knowledge preservation, indistinguishability, and the possibility of materializing a censor. We also recalled the data complexity results of evaluating conjunctive queries on CQE instances by considering each of such semantics in the case where the TBox is expressed in \(\text {DL-Lite} _{{\mathcal {R}}}\), i.e., the logical underpinning of the OWL 2 profile OWL 2 QL.

We remark that our approach is both fully declarative and semantically neat, which are its distinguishing features: the policy is expressed through logic formulas and confidentiality is enforced by automated reasoning on the CQE specification, according to the formal semantics adopted for censors. In other terms, to ensure data confidentiality it is only necessary to encode the privacy requirements into formulas, thus abstracting away from any underlying implementation mechanism.

Two main limitations of the techniques that we have described in this paper are the limited expressiveness of both the TBox and the policy language and the impossibility of expressing preferences for operating a further selection among optimal censors. To overcome the above limitations, in [18, 19] we extended both the ontology language and the policy language, showing that, for the case of the IGA censor, the computational complexity of BCQ entailment does not increase. More specifically, in the first work we considered \(\text {DL-Lite} _{{\mathcal {A}}}\) as ontology language and denials with inequalities (under some safeness conditions) for expressing the policy and, in the latter, we adopted \(\text {DL-Lite} ^{\mathcal {H}}_{horn}\) for expressing the TBox and allowing denials with number restrictions in the policy. Moreover, in [18], we proposed an intersection-based notion of censor that takes into account preferences between ontology predicates for inducing a choice among the set of all the optimal censors and, consequently, obtaining a less severe censor (in terms of disclosable information). Finally, the CQE semantics presented in [10] is based on the so-called longest honeymoon approach, where, given a sequence of user queries, a (progressively refined) subset of optimal GA censors is selected for answering them, with the aim of delaying as much as possible the necessity of censoring the answer. This property is called maximal cooperativeness, and BCQ entailment under this semantics has been shown to still be in \(\textsc {AC}^0\) in data complexity.

We believe that all these results open the way towards practical implementations of CQE engines for DL ontologies and Ontology-based Data Access (OBDA) [29, 37]. Some interesting results in this direction have already been presented in [17], where we have experimented IGA-Cens entailment over an OBDA benchmark. Such an experimental evaluation shows that controlled query evaluation can be realized in the practice using off-the-shelf OBDA engines. Analogous results have been obtained for the extended framework of [18].

The study of the CQE problem can still be extended in several directions. First, the PTIME upper bound for CQ-Cens entailment of BCQs over \(\text {DL-Lite} _{{\mathcal {R}}}\) CQE instances should be refined. In addition, it may be of interest to cover the entire family of OWL 2 profiles, thus deepening the analysis of CQE for ontologies expressed in OWL 2 EL and OWL 2 RL, which was already started in [22, 23, 28].

Finally, the investigation of different forms of policy could be still pursued, to improve the abilities of the CQE framework in the enforcement of confidentiality.

The main contribution of this work is the collection in a single treatment of a series of our fundamental results on CQE in DL ontologies, and in particular in \(\text {DL-Lite}\), previously appeared in various papers.

introduction to the problem, built through basic definitions, examples, and pointers to the original works, to which we refer the reader for further technical aspects.

advancements we achieved in the last years in this latter field. More precisely, we recall the general framework for CQE given in [28], focus on two kinds of censors proposed in that paper, namely CQ censors and GA censors, and report their most crucial characteristics, previously shown in

In this paper, we have studied the approach to CQE based on instance indistinguishability and identified a semantically well-founded notion of CQE that enjoys first-order rewritability in the case of \(\text {DL-Lite} _{{\mathcal {R}}}\) ontologies. We believe that this result opens the way towards practical implementations of CQE engines for DL ontologies and Ontology-based Data Access. We are currently working to achieve this goal. Another important future direction is a deeper study of the user model. Our framework inherits from its predecessors a relatively simple model, which assumes that the user knows (at most) the TBox and all the query answers returned by the system, and considers only the deductive abilities of the user over such knowledge. This user model might need to be enriched to capture more realistic data protection scenarios.