Ontological databases with faceted queries

The success of the use of ontology-based systems depends on efficient and user-friendly methods of formulating queries against the ontology. We propose a method to query a class of ontologies, called facet ontologies (fac-ontologies), using a faceted human-oriented approach. A fac-ontology has two important features: (a) a hierarchical view of it can be defined as a nested facet over this ontology and the view can be used as a faceted interface to create queries and to explore the ontology; (b) the ontology can be converted into an ontological database, the ABox of which is stored in a database, and the faceted queries are evaluated against this database. We show that the proposed faceted interface makes it possible to formulate queries that are semantically equivalent to SROIQFac\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathcal {SROIQ}}^{Fac}$$\end{document}, a limited version of the SROIQ\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathcal {SROIQ}}$$\end{document} description logic. The TBox of a fac-ontology is divided into a set of rules defining intensional predicates and a set of constraint rules to be satisfied by the database. We identify a class of so-called reflexive weak cycles in a set of constraint rules and propose a method to deal with them in the chase procedure. The considerations are illustrated with solutions implemented in the DAFO system (data access based on faceted queries over ontologies).


Introduction
In the last two decades, we have observed a steady increase in the number of applications based on ontology-oriented technologies. The reason is that ontologies provide a wellformulated and precise knowledge specification of the conceptualization of the application domain, and enrich query answering with intensional knowledge not explicitly captured by the extensional part of the ontology It is known that classic reasoning problems in ontologies can be reduced to query answering problems [9]. Moreover, in many data-intensive applications, these inference capabilities are dominated by the need to respond to queries. To meet these needs, the extensional part of the ontology is often stored using robust and mature database technologies. Thus, we are witnessing the synergistic combination of ontologies and databases, which results in developing modern database solutions such as ontology-based data access (OBDA) [18,59,75], ontology-enhanced databases [7], ontological databases [37], and virtual knowledge graphs (VKG) [76].
B Tadeusz Pankowski tadeusz.pankowski@put.poznan.pl 1 Institute of Computing Science, Poznań University of Technology, Poznan, Poland Query and exploration tools for human-centered interaction remain an important issue in ontology-based access to data. To make a query interface easier to use, we propose a new concept of a faceted view of the ontology in a form of a hierarchical nested facet (as an extension of the "flat" facet investigated in [7]). The faceted view of an ontology depends on the intended query, whose template is provided by the user. A relevant part of the ontology, in the form of a spanning tree covering the query template, is produced in the response as a faceted view. A faceted interface is a faceted view equipped with a set of operations allowing for creating queries, and for extending the faceted view through ontology exploration. We propose such a set of operations that allows to create queries with the expressive power equivalent to (a slightly limited) SROIQ [44,48,61].
We assume that there are two sets of rules in a faceted ontology (fac-ontology: (1) V is a set of rules defining intensional predicates (views). Intensional predicates enrich the vocabulary and are used to facilitate query formulation. (2) C is a set of constraint rules which are expected to be satisfied by the ABox A, i.e., they are materialized in the ABox by means of the chase procedure. We investigate the termination of the chase for a new class of weak cycles in C, which we call reflexive weak cycles, and show that a chase terminates in the presence of reflexive weak cycles giving a solution (but not the universal solution). We do not consider rewriting rules because we assume that after the chase with respect to constraint rules, the ABox obeys all constraint rules. The ABox is stored in a relational database.
The presented approach has been implemented and verified in the DAFO system [24,55,57].

Contribution
The novelties in this paper are as follows: 1. Defining so-called faceted ontologies (fac-ontologies) and a concept of the faceted view over them. We use the faceted view to propose a faceted interface to create faceted queries over fac-ontologies. We show, that the faceted interface allows to formulate faceted queries equivalent (with some limitations) to queries (class expressions) in SROIQ. Then, the query answering is LogSpace in the size of the ABox. 2. Identifying a new class of weak cycles, called reflexive weak cycles in the dependency graph of a class C of constraint rules in a fac-ontology. The proposed modification of the chase algorithm produces a solution, although the universal solution does not exist.

Outline
The paper is organized as follows. In Sect. 2, we define an ontology used as a running example, and formulate the motivations underlying the research. The notions of fac-ontologies and ontological databases are introduced in Sect. 3. In Sect. 4, we define the concept of a nested facet, which is next used to define faceted views, and faceted interfaces over ontological databases. Faceted queries are defined and investigated in Sect. 5. We show the relationship between faceted queries and queries (concept expressions) in the SROIQ description logic. Some subclasses of faceted queries and their graphical forms created by the faceted interface are presented in Sect. 6. In Sect 7, we discuss creation of an ontological database for a given fac-ontology, in particular we investigate the problem of the termination of the chase procedure in the presence of reflexive weak cycles. Related work and novelties of the paper are deeply analyzed in Sect 8. Section 9 summarizes and concludes the paper.

A sample ontology BibOn
As a running example, we consider an bibliographic ontology BibOn with the schema graph in Fig. 1. A schema graph is a directed graph with nodes labeled by classes, and edges labeled by properties. Unlabeled edges (with triangular arrows) denote subsumption relations between classes and between properties. Classes are unary predicates, and properties are binary predicates. Both classes and properties are divided into extensional and intensional ones. Extensional predicates are materialized in the ABox, while intensional are views defined by rules. There are two rationale behind using intensional predicates. 1. Simplification of the syntax of rules. In most implemented systems, classes and properties are restricted to being atomic names [9]. This can be achieved by using intensional predicates (views). For example, the intensional predicate (class) ACMConf can be defined by the following definition rule: Then ACMConf is an atomic class name and its definition consists of extensional predicates Con f erence and organi zer and a constant ACM . Similarly, the definition rule Paper ∃presented At.ACMConf ≡ ACMPaper defines a subclass ACMPaper. Then ACMPaper is an intensional predicate that can be unfolded to an extensional expression. Then, for example, the subsumption ACMPaper Paper, abbreviates the following subsumption between class expressions:

2.
Enriching the vocabulary of the ontology. Intensional predicates enrich the vocabulary of the ontology, thus facilitating the formulation of queries. They are, in fact, views defined over extensional predicates. During query rewriting, they are unfolded using their definitions.
In Fig. 1, extensional classes and properties are drawn with solid lines, while intensional with dashed lines.
Below, we list some rules in the BibOn using the standard description logic notation. corresp Author written By.
4. Domains and ranges of properties. If P is a property, then ∃P A specifies that the domain of P is subsumed by A. We denote by dom(P) the least upper bound of the set of classes subsumming ∃P. Similarly, by rng(P) we denote the least upper bound of classes subsuming ∃P − , e.g., dom(name) = Person, rng(name) = Data, dom(corresp Author) = Paper, rng(corresp Author) = Author.

Mandatory membership of classes (or totality of properties). A
∃P specifies that A has the mandatory membership in the domain of P, or that P is total on A. Similarly, A ∃P − says that A has the mandatory membership in the range of P, or that P − is total on A, e.g., -Person ∃name, -Author ∃written By − .

Functionality of properties:
name is a functional data property: (funct name), in Proceed is a functional object property: (funct in Proceed).

Motivation of the paper
In Fig. 2, we show a faceted query tree that formulates the request: "Get persons who: (a) are authors of at least ten papers presented at ACM or IEEE conferences, and (b) are not affiliated at the 'PUT' ('Poznan University of Technology'), and (c) served PC members at conferences where they presented their papers." In the faceted query tree in Fig. 2: 1. The root is labeled by the distinguished property r oot, we assume that r oot is reflexive universal property, i.e., ∀x, y(r oot(x, y) ↔ x = y).

106
T. Pankowski 2. Every node (rectangle) is a class-node (labeled by a class), a property-node (labeled by a property) or a constant-node (labeled by a constant). 3. A node is labeled either by " or " -the label is drawn below the node, and the set of children is then either disjunctive or conjunctive. 4. Nodes are labeled by: a number restriction (≥ 10), negation (¬), or local reflexivity (Self )-these labels are drawn above the node. 5. The semantics of the query is defined by its translation to a DL SROIQ expression: ∃r oot.(Person ((≥ 10)authorConf .(ACMConf IEEEConf ) ¬∃ affiliation. { PU T } ∃authConfPCMember.Self )). Note that ∃r oot. Q ≡ Q, so ∃r oot can be omitted.
The tree-shaped structure of a faceted query also provides a faceted view of an ontology. In order to be viewed and queried in this way, an ontology must satisfy properties implied by the following assumptions: 1. All the nodes connected by or have a common parent node, and all of them are in the same inheritance hierarchy. The set of inheritance hierarchies is pairwise disjoint, and each hierarchy has a unique top class (the least upper bound). The top class is also used to compute the negation of a query (the complement with respect to the top class). 2. Every path of nodes in a faceted query tree is of the form: (r oot, A 1 , . . . , P n , A n ), (r oot, A 1 , . . . , P n ), or (r oot, A 1 , . . . , P n , a n ), n ≥ 1, where A, P, and a (with subscripts) are a class, a property, or a constant, respectively. This can be achieved if every class is defined as a specialization of its superclass. 3. The set of all predicates is divided into extensional (can occur in the ABox A) and intensional predicates (defined by rules). The extensional predicates are materialized by means of a chase procedure in the ABox and in a database.
The running example and the above comments serve as a motivation for the following research problems presented in this paper.
1. Faceted ontologies (fac-ontologies). We identify facontologies as a class for which the proposed faceted approach can be applied. 2. Faceted-oriented query formulation. We develop a method based on the faceted approach. We examine the expressive power of this approach and compare it with other faceted-oriented query systems. 3. Creating a (faceted) ontological database for answering faceted queries. We define and consider the so-called reflexive weak cycles, and propose a method of chasing the ABox in the presence of these cycles.

Ontological database
Ontologies are commonly considered as the best method to specify conceptualizations of application domains (conceptual models) [8]. In a database design, a conceptual model is presented as the entity-relationship (ER) diagram [22], expanded entity-relationship (EER) diagram [28] or an UML class diagram. A family of ontologies which can be used to a formal specification of such conceptual models is the DL-Lite family [8,17]. The importance and usefulness of the DL-Lite family is testified by the fact that it is the basis of the W3C OWL 2 QL standard [54]. In this paper, we define a class of ontologies called faceted ontologies (fac-ontologies). In general, from a methodological point of view, each fac-ontology has the following properties: 1. The terminological part of the fac-ontology is a formal specification of the conceptual schema of an application domain created by means of the EER model. In particular, we follow the idea of subclasses specification by means of the specialization abstraction [28]. 2. Every finite part of the terminological part of the facontology must be representable through a faceted interface (Sect. 4.3). In particular, in any class hierarchy, each subclass is a specialization of the top class of this hierarchy.

Faceted ontologies and ontological database
In this section, we define an ontological database. We denote by UP = UP E ∪ UP I an infinite set of unary predicates (classes), where UP E is a set of extensional, and UP I a set of intensional classes. Analogously, we denote by BP = BP E ∪ BP I an infinite set of binary predicates (properties) consisting of a set of extensional (BP E ) and intensional (BP I ) properties. Const denotes an infinite set of constants (indi-vidual names). We also assume that a set of labeled nulls, LabNull ⊆ Const, is a subset of constants, for which the UNA (Unique Name Assumption) does not hold [1]. In particular, two different labeled nulls can denote the same individual, i.e., N I 1 = N I 2 for an interpretation I. Regular constants, i.e., constants from Const \ LabNull, obey UNA. It means that two different regular constants always denote different individuals.
Data is a distinguished class of data values (strings, numbers, etc.). The other classes are object classes and their instances are objects. Both data values and objects are represented by constants. Every property with the range in the class Data is a data property; otherwise, it is an object property.
We define now the syntax for rules.
2. Constraint rules are rules built from extensional predicates and conforming to the syntax:

Definition 2 A set Class of classes is an inheritance hierarchy
if it is a finite bounded complete partial order, where H ∈ Class is the least upper bound (the top class) in H.
We will also denote by A the top class H of an inheritance hierarchy to which belongs a class A.

Definition 3 A class
A is a specialization of a class A 0 in a set of rules if the rule A 0 C ≡ A is derivable in this set, and the syntax of C is given in Definition 1.

SROIQ Fac -a subset of SROIQ
We will consider SROIQ Fac as a query language, which is a subset of SROIQ [44,46,48,61]. The syntax of SROIQ Fac is defined by the grammar where A is a class, P is a property, a is a constant, and k is an integer, k ≥ 1.
We will use SROIQ Fac as a reference language for faceted queries and assume that it satisfies the following restrictions:

A nominal {a} can occur only in extensional restrictions,
i.e., in ∃R.{a}. 2. Every query q in SROIQ Fac has a uniquely determined type, t ype(q), where: Formally, the semantics of SROIQ Fac is defined in terms of an interpretation I consisting of a non-empty set I (the domain of the interpretation) and an interpretation function · I , which assigns: to every atomic class A a set A I ⊆ I , to every atomic property P a binary relation P I ⊆ I × I , and to every data name a an element a I ∈ Data I . Table 1 shows how to obtain the semantics of each compound expression from the semantics of its parts [9,48]. Note that the semantics of the negation of q is defined relatively to the t ype(q), i.e., to the top class (the "local universal class") of the inheritance hierarchy containing the class of answers to q.
An interpretation I satisfies the subsumptions C 1 C 2 and S 1 S 2 if C I 1 ⊆ C I 2 and S I 1 ⊆ S I 2 , respectively. I satisfies A 1 ¬A 2 and R 1 ¬R 2 if A I 1 ∩ A I 2 = ∅, and R I 1 ∩ R I 2 = ∅, respectively, and satisfies (funct S) if S I is a partial function. We say that I is a model of a TBox or an ABox if it satisfies all subsumptions and facts in it. An ABox A is consistent with C if A and C have a common model. If I is a model of O, then we denote this as

Query answering in ontological databases
Let q by a query in SROIQ Fac . We denote by q(x) a first-order form of the query q, which can be obtained using rules proposed in [61]. A certain answer to q(x) over a fac- is an ontological database, then C is satisfied in A db , and is immaterial in query answering, i.e., the equality holds: To eliminate intensional predicates in q, we unfold rules in V. Then every intensional predicate is replaced by its extensional definition. We obtain a query q [V] , in which only extensional predicates appear, and the equality holds: and M is a set of dependencies of the form: where α ranges over unary and binary atoms, and R is an n-ary relation name in Sch, n ≥ 1. Additionally, we assume that D only has data "exported" from A db , and each labeled null is mapped to NULL.
The mapping M is used in the translation of q [V] into a SQL query q [V,M] over the database D. As a result, the set of answers to q over O db coincides to the set of answers to q [V,M] over the database, i.e.,

Faceted views and faceted interfaces over ontological databases
We now propose a method of query formulation against an The method is based on the idea of faceted search [70] and faceted queries [7]. We start with a concept of the nested facet as a generalization of the ("flat") facet proposed in [7].

Nested facet
a non-empty set, where either: (a) X = type and is a set of classes, or (b) X is a binary predicate (a property) and contains a distinguished symbol any and a set of individual names (constants) or a set of classes. A facet of the form (X , ) is conjunctive, and a facet of the form (X , ) is disjunctive.
We will extend the above notion of facets to nested facets, which have the form of a tree. In this tree-oriented notation, a facet (X , ♦ ) will be written as a tree ♦X ( ), where ♦X is the labeled root, and is a set of its children, interpreted as a conjunctive or disjunctive set depending on ♦. Note that a facet defined in [7] forms always a two level tree.
In commercial applications, faceted queries are usually limited to flat facets where the flat facet corresponds to pos-sible values of one data property. A query consists then of several facets, each of which relates to a different data property (aspect) of the searched object (e.g., price, manufacture, size, in the case of mobile phones). We extend this approach by introducing object properties and subsumptions between classes and properties. This requires nested faceted queries. The nested form of faceted queries is also due to the fact that a faceted query is a (complex) Description Logic query, and its syntax tree is nested.
We define a nested facet as a finite multi-level tree with at least two levels.

Definition 6
Let A, P, and a, possibly with subscripts, be a class, a property and a constant, respectively. Let r oot be a distinguished property not in BP, and ♦ ∈ { , }. A nested facet f is an expression with the syntax defined by the grammar: where k, l, m, n ≥ 1.
Every expression ♦X {Y 1 , . . . , Y n }, n ≥ 0, of the category f , u, and t is a tree with the root X and a set (possibly empty) of subtrees, which are children of X . The tree is labeled by ♦. By default, if X ∈ BP∪{r oot}, then ♦ = , otherwise ♦ = , i.e., the set of property-node children is disjunctive, while the set of class-node children is conjunctive. P{a 1 , . . . , a n } abbreviates the tree P{{a 1 }, . . . , {a n }}. The trees A{} and P{} are abbreviated by A and P, respectively. Example 2 A nested facet corresponding to the sentence "Persons who are authors of papers presented at ACM or IEEE conferences, and affiliated in PUT", is: Its graphical form is shown in Fig. 3. Note that in the graphical representation, the logical connective is drawn below the labeled node.
Its semantics is given by the following query in SROIQ Fac (a formal translation is given later on in Definition 10):

Faceted view
If a nested facet relates to a fac-ontology, then we call it a faceted view of this ontology. If u is a class-node of a nested facet (Definition 6), then we denote by ρ(u) the root class of u, i.e., ρ(♦A{t 1 , . . . , t n }) = A, and similarly, by ρ(t) we denote the root property of t.
In contrast to a structural graph ( Fig. 1) that provides a graph-oriented view of the entire ontology, the aim of a faceted view is to provide a tree-oriented view (a hierarchical view) of a part of this ontology relevant for the intended query.
A faceted view over the ontology is the basis of faceted search, where the formulation of queries proceeds over a hierarchical view of this ontology [25,77]

Faceted interface
We show now how a faceted view over a fac-ontology can be used to built a faceted interface supporting query formula-tion. Intuitively, a faceted interface is a faceted view equipped with some operational features allowing to label, extend and narrow the view.
1. N is a set on nodes, E ⊆ N × N is a set of edges defining a tree rooted in r , and λ is a node labeling function (N L ⊆ N is a set of leaves): restriction is bound to an object property node v.
In Fig. 4, there is a faceted interface in the form implemented in the DAFO system. On the left-hand side, we see a sample initial faceted interface on which a user can operate using operations presented in the context menu shown on the right-hand side.
A faceted interface in DAFO is implemented as a labeled AND/OR tree such that: 1. Nodes are labeled by classes (class nodes), properties (property nodes), and constants (constant nodes). Constant nodes are visible only on demand. 2. There is a distinguished root, which is a property node labeled by the universal reflexive property root. 3. At the beginning, any class node has the black color and the AND ("∧") label meaning that the set of its children is interpreted as a conjunctive set, while each property node has the red color and the OR ("∨") label, denoting that the set of its children is a disjunctive set.
A user operates interactively and iteratively on a faceted interface while building a faceted query. The user can refine the query by operating on this interface and by browsing and exploring the application domain modeled by the underlying fac-ontology. The operations on a faceted interface are listed in the context menu shown in Fig. 4.
Using the context menu, a user can set or change labels assigned to nodes, as well as to enlarge or reduce the interface. The set of labels defines the state of a node. A state is a quintuple determined by functions defined in Definition 8, i.e., Sel(), And Or(), Pos N eg(), NumRestr(), and Sel f ().
In addition, the following operations are used to modify the content of a faceted interface: 1. AttributeV alues(v)-is applicable to data property nodes to upload (from a database), insert, edit or remove values of this data property. 2. Clone(v) (duplicate)-allows to clone the subtree rooted in the indicated node. A new subtree is created and inserted as a child of the indicated node, and the inserted subtree is isomorphic, up to node identifiers, with the cloned subtree. Applicable to class-and property nodes except of the root node. 3. E x plore(v)-allows to display invisible parts of the facontology and include them into the faceted interface. 4. RemoveALLUnchecked(v)-removes from the faceted interface all unselected (unchecked) nodes.
A graphical form of a state of a sample faceted interface is shown in Fig. 2. It arises from Fig. 3 by: adding one subtree as a result of using the Explore(Person) operation, adding a number restriction to author O f node, negating affiliation node, and labeling by Self the authorConfPCMember node.

Definition 9
A faceted query is a faceted interface in which every node is selected.
The semantics of a faceted query F is defined by translating it into a description logic expression. We assume that r oot is the universal reflexive property with the semantics:
A SROIQ Fac query with the syntax conforming to the syntax of faceted queries will be called a query in a faceted normal form (FacNF). The following proposition follows from Definition 10.

Example 3
The following SROIQ Fac queries are not in FacNF: In Example 4, they will be transformed into FacNFs.

Transformation of SROIQ Fac into FacNF
We will show that the expressive power of the faceted query system is equal to that of SROIQ Fac . To this end, we will show that every SROIQ Fac query can be transformed into a query in FacNF. This means that for any query in SROIQ Fac there is a semantically equivalent query that can be formulated using the faceted interface.

Theorem 1 Every query in SROIQ Fac over a fac-ontology O = (V, C, A) can be transformed into an equivalent query in FacNF over O.
Proof Let q be a query in SROIQ Fac over O. We transform q into FacNF as follows: 1. Every class A occurring in q is replaced by the left-hand side of its specialization (unfolding the specialization).

replace(A, A C),
and q spec arises from q by replacing A with A C, and syntax of C is given in Definition 1. 2. q spec is converted into disjunctive normal form (DNF): where q i , 1 ≤ i ≤ m is a conjunction of queries in SROIQ Fac . 3. Every disjunct q i , 1 ≤ i ≤ m, containing negation of a top class is removed from q dnf , since it evaluates to the empty set. The reduced disjunction is: 4. Every disjunct q i , 1 ≤ i ≤ n, in q red can be rewritten into the form: 5. The procedure is applied recursively to every subquery q appearing in formulas of the form ∃P.q.

Example 4
Let q 1 , q 2 and q 3 be queries specified in Example 3. Let classes A 1 , A 2 , and A 3 be defined by the following specializations: 2. Transforming q 2 = ∃P 1 .¬(A 1 A 2 ) into FacNF:

Examples of query formulation
Now, we show how some representative queries can be formulated in the DAFO system [24]. Below, we consider queries, some of which concern the involvement of people in conferences-ACM conferences and/or conferences in the USA. Thus, a user can start with indicating the relevant classes to the intended query (as a query template, Fig. 5a): Person as the type of the expected answers, as well as ACMConf and USAConf related somehow with the Person. In response, a faceted interface is created, (Fig. 5b). Note that a person can be connected to conferences as a participant and/or as a PC member. Formulation of a query over a faceted interface requires a sequence of operations on this interface. Each state of a faceted interface represents a faceted query, so operations over the interface are transformations in a space of faceted queries. We will consider the following kinds of (faceted) queries: 1. Positive existential queries: -with default conjunctive and disjunctive facetsquery q 1 , -a disjunctive set is switched to a conjunctive onequery q 2 , -a subtree is cloned (duplicated) and values of some data properties are added-query q 3 .

Queries with negation:
-query with one negation (exclusion)-query q 4 , -query with double negation (equivalent to a universal quantification)-query q 5 .
3. Query with a number restriction-query q 6 , 4. Query with a local reflexivity (a cycle)-query q 7 .
For every query we provide: -a natural language version, -a DL version in SROIQ Fac , -graphical forms of faceted queries (q 1 )-(q 7 ), their firstorder forms, for (q 1 )-(q 5 ), and a notation involving variables for (q 6 ) and (q 7 )-all as screenshots on DAFO, -operations on the faceted interface used to create the final faceted query.
1. Positive existential queries q 1 : "Authors of papers presented at an ACM conference or at a conference in the USA" In Fig. 6a, the query is expressed as a faceted query, and in Fig. 6b there is a first-order form of q 1 . The query is created by performing the following sequence of operations on the faceted interface in Fig. 5b: pcMemberOf.Uncheck(); RemoveAllUnchecked().  In Fig. 7, the query is depicted as a faceted query and its first-order form as a result of operating on the query/faceted interface in Fig. 6a: authorConf.SetToAND().  The result is in Fig. 9-q4(a) and q4(b). In q 5 , negation is used to express a universal quantification. q 5 : "Papers written only by PUTAuthors" q 5 = Paper ¬∃writtenBy.¬PUTAuthor) = Paper ∀writtenBy.PUTAuthor.

Queries with number restrictions
q 6 : "Authors participating in over ten ACM conferences in the USA" The query is formulated in DAFO as shown in Fig. 10. The argument of count(x 1 ) indicates that x 1 is tested for the required number of distinct values. If x is connected to more than 10 different valuations of x 1 , then this valuation of x is returned as an answer. To formulate the query, we use the query in Fig. 7a as a faceted interface and perform the following operation on it: authorConf.SetNumRestr.count()>10;

Queries with local reflexivity (looking for cycles).
q 7 : "Authors of papers presented at conferences where the author was a PC member" q 7 = Person ∃authConfPCMember.Self ,  where: authorConf • confPCMember ≡ authConfPCMember. The query is formulated in DAFO as shown in Fig. 11a. A syntax tree in Fig. 11b is an intermediate form with variables and a global variable @x that indicates objects which are connected with themselves by the property authorConf • confPCMember. The occurrences of @x determine the equality [x = x 3 ] saying that we are looking for objects assigned to x and x 3 that are equal.

Expressiveness of DAFO compared to other systems
In Table 2, we compare the expressive power of four faceted query systems: DAFO, BrowseRDF [53], Sewelis [34] and SemFacet [7,64]. The first column in the table contains queries in SROIQ Fac , which are used as reference points to interpret the semantics of operations in the compared systems.
Since there are differences in the interpretations of some concepts and notions (especially, negation, aggregation, and recursion), we describe these differences in comments.

Disjunction and conjunction.
In DAFO, each disjunction q 1 q 2 , and conjunction q 1 q 2 , requires that q 1 and q 2 are of the same type. 2. Negation A negation is computed as the complement with respect to a guard.
-In a negation q 1 ¬q 2 , the guard is q 1 , and q 1 and q 2 are of the same type. -In DAFO, a negation ¬q is guarded by the t ype(q).
-In BrowseRDF, the negation can only concern the existence of an property. Negation is true for objects that do not have the negated property. -In Sewelis, the complement is computed with respect to the set of all objects (the universal class).
3. Navigation, recursion, reachability. In all analyzed systems, existential restrictions: ∃R.q, ∃R, ∃R.{a}, are fundamental for navigation through the underlying ontology, where: (a) ∃R.q denotes a set of objects connected via an object property R with objects in q, (b) ∃R denotes a set of objects having a property R (irrespective of its value), ∃R.{a} is a set of objects connected via a data property R with a constant a. R can denote a property P or its inversion P − (only for object properties). Properties can be recursively composed, expressing in this way a (restricted) form of a recursion. In DAFO, composed properties can be defined as chains of other properties. In SemFacet, so called reachability atoms, Next(x, y) and Next + (x, y) are introduced. They denote (dynamically) a property, or a sequence of properties, leading from x to y. 4. Aggregation. In SROIQ, an aggregation is limited to a number restriction. We also do this in DAFO restricting ourselves to the count() function. 5. Cycles. A reflexivity restriction ∃R.Self , denotes objects which are connected via R with themselves. In this way, cycles can be found. In DAFO and in Sewelis, this is achieved by means of special variables (in DAFO prefixed by @). Two different occurrences in a query of the same variable @x indicate that a value of @x is connected with itself by a property (possibly composed) specified in the query.

Materialization of constraint rules
Given a fac-ontology O = (V, C, A), our goal is to convert O into such an ontological database O db = (V, C, A db ) that A db is a minimal model of A ∪ C, i.e., A db | A ∪ C. Such the A db can be obtained as the chase of A with respect to C, A db = chase C (A). In other words, A is included in A db and C is materialized in A db . In general, the chase procedure can: (a) be infinite, (b) terminate with the fail, (c) produce a finite set of facts containing A and all consequences of C. Some rules in C, namely disjointness and functionality, are used to verify consistency of the ontological database. The functionality rules can also be used to discover some missing values, represented by labeled nulls.
We divide C into two subsets: C 1 and C 2 : 1. C 1 contains rules of the form: C 1 C 2 and R 1 R 2 . Their first-order forms are tuple-generating dependencies: y(ϕ(x, y) → ∃zψ(x, z)), where x, y, z are tuples of variables and ϕ(x, y), ψ(x, z) are conjunctions of atoms over all the given variables and constants. These rules are used to chase new facts. 2. C 2 contains disjointness rules, A 1 ¬A 2 , R 1 ¬R 2 , and functionality rules (funct S), with first-order forms, respectively: These rules are mainly used to check if the chased set of facts is consistent.

Termination of chase
The chase can be defined as the data exchange problem [21,33], and can be understood as a pair (T, C), where T = Sig(C) ∪ Const is a target schema, and C is a set of target-totarget dependencies. A solution of an instance A of T with respect to C is an instance A of T such that A | A ∪ C.
In general, an instance A can have infinitely many solutions.
In particular, if A is a solution for A with respect to C, then every A containing A is also a solution for A. A set A of facts is the universal solution of A with respect to C if (a) A is a solution of A, and (b) for each solution A of A there is a homomorphism h from A to A (h is the identity on constants). Intuitively, a universal solution contains no more and no less information than that specified by the given data exchange problem. The chase procedure chase C (A) is guaranteed to terminate in polynomial time producing the universal solution for A with respect to C if the dependency graph of C is weakly acyclic [5,33].

Definition 11
The dependency graph, DG(C), over a set C of constraint rules is constructed as follows: (a) for every class A occurring in C there is a node (A, 1) in DG(C); (b) for every property R occurring in C there are two node (R, 1) and (R, 2) in DG(C); (c) for every rule of the form A 1 ∃R.A 2 in C there are three edges in DG(C): C is weakly acyclic if its dependency graph DG(C) does not have any weak cycle, i.e., a cycle going through a special edge.
Sets of constraint rules in fac-ontologies usually have a lot of weak cycles because properties and their inverses are usually taken into account.  The dependency graph of C a is depicted in Fig. 12. The graph has a weak cycle, and this leads to an infinite chase. On the other hand, a finite solution exists, but this solution is not the universal solution. We see that is a solution for A. However, a solution is also However, there is no any homomorphism h : A → A since h is not identity on a, because h(a) = a and h(a) = N 2 . In this case, the universal solution for A with respect to C 1 does not exist.

Reflexive weak cycles
Now, we identify a class of weak cycles in dependency graphs and call them reflexive weak cycles (RWC). For RWC, we will modify the chase problem. The modified chase will terminate with a universal solution. This universal solution is also a solution (but not a universal solution) for the chase before the modification.

Definition 12
Let C be a set of constraint rules and DG(C) a dependency graph over C. A reflexive weak cycle (RWC) in DG(C) is a cycle of the form such that: A RWC is abbreviated as (A 1 , R 1 , A 2 , . . . , A n , R n , A 1 ).
Proof We prove the proposition for n = 2. Let us assume that R 1 R − 2 and dom(R 1 ) = A. ⇒ If A(a) then for some b, R 1 (a, b) and R 2 (b, a). Therefore, Then, for some d, R 1 (c, d) and R 2 (d, c). Thus, A(c). This reasoning can easily be extended to any n > 2.
Proposition 3 shows that the chain R 1 • · · · • R n of properties belonging to RWC is "local reflexive," i.e., any object in the domain of R 1 is connected with itself via this chain. This property justifies the term "reflexive" for the class of weak cycles under consideration. We define a p-aware chase as a chase in the presence of a RWC p. The aim is that the p-aware chase terminates with a solution, although not a universal solution.
Definition 13 Let C be a set of constraint rules, A a set of facts over Sig(C)∪Const, and p = (A 1 , R 1 , A 2 , . . . , A n , R n , A 1 ) a RWC over C. A p-aware chase of A with respect to C, chase Intuitively, our goal is to prevent firing of the rule A n ∃R n .A 1 . Instead, the rule R 1 • · · · • R n−1 R − n is applied. First-ordered forms of the rules involved in the chase are: By(y, x)).
The p-aware chase consists of the following steps: author O f (a, N 1 ), author O f (a, N 1 ), Paper(N 1 ), written By (N 1 , a) A is a universal solution for A with respect to {σ 1 , σ 3 }, and A is also a solution for A with respect to {σ 1 , σ 2 , σ 3 }, but not a universal solution. To show that A is not a universal solution for A with respect to {σ 1 , σ 2 , σ 3 }, let us note that, for example, is also a solution for A with respect to {σ 1 , σ 2 , σ 3 }, but there is not a homomorphism h : A → A preserving constants (we have h(a) = a and h(a) = N 2 ).
Then A is a universal solution for A with respect to C and a solution for A with respect to C.
Proof The set C is weakly acyclic, so chase C (A) produces a universal solution A for A with respect to C . We have to show that A is also a solution for A with respect to C, i.e., that also the removed rule, σ = A n ∃R n .A 1 , holds in A : Let A 1 be a result of chasing just before applying the rule σ = ∀x, y((R 1 • · · · • R n−1 )(x, y) → R n (y, x)). Then a n ), A n (a n )}, for some constants a 1 , a n . Now, the rule σ is applicable. After the application: and A 2 also satisfies σ , i.e., A 2 | A n ∃R n .A 1 .
The above reasoning shows that every result A = chase C (A) of chasing that satisfies σ also satisfies σ . This proves that A is a solution for A with respect to C. However, A is not a universal solution for A with respect to C, that was shown in Example 7.
The p-aware chase can be generalized to an arbitrary set of RWCs. For p = (A 1 , R 1 , A 2 , . . . , A n , R n , A 1 ), we denote σ ( p) = A n ∃R n .A 1 .

Definition 14
Let C be a set of constraint rules, A a set of facts over Sig(C) ∪ Const, and { p 1 , . . . , p k }, n ≥ 1, a set of RWCs. A { p 1 , . . . , p k }-aware chase of A with respect to C, chase

Ontologies and databases
There is a long research tradition in investigating similarities and differences between ontologies and databases [1,13,14,49,50]. The investigations concern both the underlying theories and behavior in practice. Ontologies offer richer semantics due to the ability to represent both extensional (by means of a set of facts) and intensional (by means of a set of rules) knowledge. Thus, ontologies are commonly considered as the best method to specify conceptualizations of application domains (conceptual models) [11,20,40]. However, the efficiency of query answering on ontologies is unsatisfactory [17,37]. Databases, on the contrary, are characterized by less expressive semantics but higher efficiency. Therefore, databases based on relational or graph models can be used to store ontological instances. Other differences relate to the interpretation of the rules [1,50]. In particular, all rules are interpreted as integrity constraints in databases and as deductive rules in ontologies. To overcome the differences in treating rules in ontologies and in databases, in [50] a concept of the extended knowledge base is proposed, where the set of rules is divided into a set of integrity constraint rules (satisfied in the knowledge base) and a set of deductive rules (representing intensional knowledge). Then, integrity constraints can be disregarded while answering positive queries [50,51]. The combination of ontologies and databases has found a satisfactory solution in ontology-based data access (OBDA) [16,19,63,65,75], where the TBox of an ontology is used as a global schema over a set of integrated databases.
An OBDA specification is a triple P = (O , M , Sch), A) is an ontology, Sch is a data source schema, and M is a mapping from Sch to the signature of O [75]. For an instance, D of Sch, A = M (D), and M is a set of source-to-target tuple-generating dependencies of the form: ∀x, y( (x, y) → α(x)), where is a conjunction of n-ary atoms, and α is an unary or a binary atom. A query q over O is rewritten with respect to T and M into a firstorder query over the data source. Notice that then: (a) the rules in T are not satisfied in the data source; instead, they are used in the first-order query rewriting and are not very expressive, usually in DL-Lite [17] or sticky [37], (b) integrity constraints are defined and managed in the data source management system.
In this paper, we present a DAFO approach to ontological databases that differs from the OBDA as follows. Let P = (O db , M, Sch) be a DAFO specification, and P = (O , M , Sch), O = (T , A), an OBDA specification. Then: (a) O db = (V, C, A db ), where T is a subset of C, T ⊆ C, and is satisfied in A db , as a result of the chase procedure; (b) M is a mapping from the signature of A db into Sch consisting of dependencies of the form: ∀x(α(x) → ∃yR(x, y)), where α ranges over unary and binary atoms, and R is an n-ary relation name, n ≥ 1. The inversion of M, in the sense of Fagin [5,32], is an OBDA mapping, (c) V is a set of rules defining intensional predicates, i.e., predicates not occurring in A db ; these predicates are used in query formulation and "compensate" the poor expressive power of mapping dependencies in DAFO; (d) the query rewriting in DAFO is reduced to unfolding intensional predicates with their extensional definitions, and to apply mappings in translating first-order queries into SQL queries.

Faceted queries over ontologies
In the traditional setting, the reasoning tasks in ontologies include satisfiability and subsumption of concept expressions (with respect to a TBox), and instance checking (with respect to an ABox) [9,61]. New applications of ontology-based systems require not only reasoning capabilities, but also query answering mechanisms [7,16,37,50,69]. Moreover, in [9] it was shown that reasoning tasks over an ontology can be realized by means of queries. Queries over ontologies can be expressed using different query mechanisms [2,7], from first-order logic formulas to graph query languages, such as SPARQL [67], Cypher [36,39,68], or Gremlin [4]. However, such expressive languages as SPARQL are not well-suited for end-users. Thus, we observe attempts to develop interactive graphic-oriented ontology query languages such as, for example, ViziQuer [78], SPARQLGraph [62], OptiqueVQS [66,73], SEWASIE [10], OntoVQL [31], NL-Graphs [27], K-search [12], which present ontology views combined with form-based query entry interfaces. A promising alternative to the aforementioned languages is approaches based on faceted search resulting in faceted queries [7,70].
Faceted search has emerged as a foundation for interactive information browsing and retrieval and has become increasingly prevalent in online information access systems, particularly for e-commerce and site search [7,[70][71][72]74]. Especially significant is combining browsing and searching in more flexible ways to support non-professional end-users in finding information. The implementation of the browsing paradigm allows for exploring and expressing information needs in interactive and iterative ways [42,72,74]. Most importantly, browsing and exploring concerns both data and metadata. Faceted queries are created interactively and iteratively during the faceted search.
The first systems of this kind are /facet [43] and gFacet [42], which identify and implement the basic features of the semantic faceted search paradigm. These two systems operate over RDF data. The expressive power of /facet and gFacet is low. Multiple selections are connected by a logical AND and thus restrict the result set to only objects that satisfy all selections [42]. Exploration is restricted to a navigation during which a conjunction of constraints is added to or removed from a dynamically created faceted query. In gFacet [42], facets and result sets are represented as nodes connected by directed edges labeled by semantic relations between nodes in a graph visualization. More expressive faceted navigation for RDF data was proposed in BrowseRDF [53]. The proposed set of operators describes faceted browsing in terms of a set of manipulations and is defined on an RDF graph. The operators are: basic selection, existential selection, not-existential selection, join and their inversions, and intersection. Further enrichment of the expressive power of exploratory search was proposed in such systems as: Sewelis [35] (allows to search for a limited form of cycles), VisiNav [41] and OpenLink Virtuose [30]. Faceted search solutions are offered as commercial products by some leading software vendors (e.g., ORACLE [52], Microsoft [45], IBM [23] and Apache [3]) There are a large number of implemented systems, which are mostly based on RDF/S. About 30 faceted search systems based on RDF/S datasets are surveyed recently in [72].
Results in [7] can be considered a milestone in the development of the theory of faceted search systems. The authors propose a rigorous theoretical underpinning for faceted search in the context of RDF and OWL 2 ontology profiles. The expressive power of the faceted search language considered in [7] is limited to the description logic positive existential queries (PEQ). The theory is used in the implementation of SemFacet [6,38] and Ontop [15]. Next, the query language in SemFacet has been extended with a restricted form of aggregation and recursion [47,64]. A first approach to view an ontology as a nested facet system for human data integration was proposed in [77].
In this paper, we extend the concept of "flat" facets proposed in [7] to nested facets, which are used to propose a faceted view of fac-ontologies. A faceted view equipped with a set of operations is defined as a faceted interface allowing to explore the ontology and creating queries. The queries are formulated using this graphical tree-shaped faceted interface. This way of querying determines so-called faceted normal form (FacNF) of queries. We prove that every expression in SROIQ (with some limitations) can be converted into FacNF, thus can be created using a faceted interface. This way of formulating queries requires that the ontology meets cer-tain conditions. These conditions are met by fac-ontologies, which are proposed in this paper.

Conclusions and future work
In this paper, we have proposed a formal approach and a methodology to create ontological databases with a faceted interface treated as a builder for faceted queries. We identified a class of ontologies, called fac-ontologies, over which a faceted human-oriented interface can be created. We have specified conditions for the class of fac-ontologies and defined the concept of a nested facet, which provides a hierarchical faceted view over fac-ontologies. A hierarchical view with a set of operations constitutes a faceted interface used to formulate queries on the fac-ontology. The set of rules in the TBox consists of two sets: (a) a set V of rules defining intensional predicates, which extend the vocabulary and enrich the expressive power of the fac-ontology, and (b) a set C of constraint rules, which are used in the chase procedure to transform a fac-ontology into an ontological database. We show that any query in the description logic SROIQ (with some restrictions) can be formulated using the proposed faceted interface.
We see many directions for future work. (1) Transforming heterogeneous data sources into a relational database. In the DAFO approach, an ontology is mapped to a relational database. In this way, an ontological specification of data sources based on, e.g., XML [29], JSON [26] or RDF [60], can be mapped and materialized in a relational database. Then, the integrated repository can be effectively queried using a faceted interface. (2) Exploratory search. Our solution has a rather limited capability to exploratory search of ontologies, which is a characteristic of faceted search systems. Therefore, it would be interesting to enrich the faceted interface with an extended ability to navigate through ontologies with the structure unknown to the user.
The considerations in the paper are based on the DAFO (Data Access based on Faceted queries over Ontologies), that was implemented on the top of a commercial relational database engine and ensures high efficiency of query answering. Some details of the implementation as well as the high efficiency of query answering are reported in our previous work [55,56,58].
The performance of DAFO was evaluated on the basis of bibliographic datasets containing data on authors, papers, proceedings, and conferences [56]. The basic dataset was prepared by extracting data from DBLP 1 resources (from XML, HTML, and BibTex files), and enriched with data extracted from personal and conference home pages. This basic dataset includes data on 1907 conferences, 1853 proceedings, 3818 papers, 65 affiliations, and 61 authors. The dataset is organized in the form of an ontological database. The DAFO server is written in C# with NET Core 2.2, DAFO client is written in JavaScript, and the extensional part of the ontological database is stored in SQL Server (under the license Microsoft Imagine Premium). The total execution time consists of the time of: (1) transforming the faceted query into an extensional first-order form, (2) translating the extensional form into an SQL query, (3) executing the SQL query. It turns out that step (1) is the most time-consuming. The experiments showed that the total response time is very promising and for queries similar to the examples in Sect. 6.1 is less than 50 ms. The system is available on GitHub [24].