1 Introduction

This is a report on the doctoral dissertation [1], and briefly explains the questions and obtained results.

Description logics (DLs) are a family of logic-based knowledge representation languages used to describe the knowledge of an application domain and reason about it in a formally well-defined way [2]. DLs allow to describe the important classes of the knowledge domain as concepts, which formalize the necessary and sufficient conditions for individual objects to belong to that concept, expressed as a combination of atomic properties (concept names) and properties that refer to the relationship with other elements (role restrictions). In order to encode the conceptual knowledge, the user can then state how these concepts relate to each other, for example by giving superconcept–subconcept relationships. Additionally, DLs allow to express knowledge about individual objects, to which concepts they belong and how they relate to each other.

A variety of different DLs exist, from the inexpressive but efficient DL \(\mathcal {E}\mathcal {L}\), to the propositional complete DL \(\mathcal {ALC}\), to very expressive DLs; differing in the set of properties one can use to express concepts, as well as the types of axioms available to describe the relations between concepts or individuals. However, all classical DLs have in common that they can only express exact knowledge, and correspondingly only allow exact inferences. Either we can infer that some individual belongs to a concept, or we can’t, there is no in-between. In practice though, knowledge is rarely exact. Many definitions have exceptions or are vaguely formulated in the first place, and people might not only be interested in exact answers, but also in alternatives that are “close enough”. In order to formally talk about how close different alternatives are, the notion of semantic similarity and dissimilarity measures between DL concepts is used.

2 Similarity Measures

Fundamentally, similarity measures quantify how close two things are from a conceptual point of view [3]. They are thought of as one of the fundamental concepts of human reasoning. Many different approaches have evolved on how to measure similarity, but most rely on the same intuitions: The similarity between two objects increases with the commonalities that they share, while it decreases with the differences between them. Additionally, many similarity measures also have a notion of a maximal and minimal similarity, which intuitively occur when the objects have no differences or no commonalities, respectively.

We are interested in semantic concept similarity measures, which compare the meaning of two DL concepts, based on the background knowledge defined in the DL knowledge base. For concept similarity measures, the above intuitions can be formalized into a set of formal properties: a measure should be symmetric, invariant under equivalent concepts, and similarities 0 and 1 occur exactly if the concepts have no common subsumers or are equivalent, respectively. While many different similarity measures for DL concepts have been defined before, they all have drawbacks: In particular, no measure was able to completely use general (i.e., possibly cyclic) knowledge, while at the same time satisfying all formal properties states above.

In [4] we introduce the similarity measure \(\sim _c\), which is a parameterizable concept similarity measure that works w.r.t. general \(\mathcal {E}\mathcal {L}\) knowledge bases. This measure is based on the similarity between the elements of the canonical interpretations of the the two concepts, which is computed by the interpretation similarity measure \(\sim _i\). We show that \(\sim _i\) (and thus \(\sim _c\)) is well-defined and computable in polynomial time, while satisfying all of the formal properties stated above [5].

3 Query Relaxation

Knowledge about individuals, the categories they belong to, and the relations between them, is usually stored in some kind of relational database, an XML file, an RDF triple store, a DL ABox, or similar storage formats. In order to access this data, one can formulate a query that describes which of the individuals one is interested in, by restricting for instance the categories or the relations to other individuals. A query answering system then selects all those individuals that satisfy the query and returns them as answers.

However, when specifying the query, one may not only be interested in the exact answers, which satisfy every single restriction that is part of the query; alternatives that do not completely satisfy the query, but most of it, may give interesting insights as well. The process of broadening the set of answers to include similar alternatives is often called query expansion or query relaxation, and has attracted a great deal of research.

Classical query relaxation approaches usually only work in presence of a very simple background ontology, like a concept hierarchy. Also, often the process of query relaxation can not be influenced. However, a way to specify which aspects of the query are less important and may be relaxed further can be exceedingly useful to control the query relaxation process based on user- or query-dependent preferences. In order to allow for parameterizable query relaxation working with general knowledge, we investigate the problem of instance queries relaxed by concept similarity measures. Formally, an individual is called a relaxed instance of concept Q w.r.t. a similarity measure, a DL knowledge base, and a threshold t, iff it is instance of a concept \(Q'\) that is similar to Q with a degree of at least t.

The first case we consider are arbitrary concept similarity measures and unfoldable TBoxes, which do not allow the definition of cyclic knowledge [6]. We show that the problem of computing all relaxed instances is decidable as long as the similarity measure used for the relaxation has certain properties (equivalence invariant, and the role-depth of the concepts can be bounded), but is also highly inefficient: It has non-elementary complexity. Afterwards we restrict to a single family of similarity measures, namely \(\sim _c\), but allow for general \(\mathcal {E}\mathcal {L}\) TBoxes. In this setting we derive an NP algorithm for both checking whether an individual is a relaxed instance of the query concept, and for finding all answers to the relaxed instance query [4, 5].

In [7] we present an implementation for the case of general TBoxes: the Elastiq system. In order to show the usefulness of relaxed instance queries and the \(\sim _c\) measure, we evaluate Elastiq on different ontologies. The results indicate that the answers that Elastiq returns are generally quite intuitive and the ability to tweak the results using the parameters of \(\sim _c\) is very useful; the performance of Elastiq also seems to scale quite well with the size of the ontology. However, choosing a suitable threshold value and finer control over the parameters is often not quite as clear.

4 Prototypical Definitions

In practical applications one often cannot define all relevant concepts exactly by giving necessary and sufficient conditions. In fact, it has been argued that humans generally recognize categories by prototypes rather than concepts. For example, it is impossible to define an abstract concept like “games” using just a set of necessary and sufficient conditions, such that the definition includes various things like video games, Olympic games, and jigsaw puzzles, while excluding all non-games [8]. Instead, other formalisms like prototypical definitions, where we can define games as things close to one or more prototypical objects, might be more useful.

In order to be used within a formal knowledge representation language with automated reasoning capabilities, such prototypes need to be equipped with a formal semantics. For this, we use ideas underlying Gärdenfors’ conceptual spaces [9], where categories are explained in terms of convex regions defined using the distance from a focal point. To obtain a concrete representation language, we define prototype distance functions [10], which return for each element of an interpretation the distance of this element to a focal point; if the focal point is given as a specific individual, these functions could be seen as dissimilarity measures between elements. Prototype distance functions then allow the introduction of a new concept constructor for specifying prototypes: \(P_{\le t}(d)\) selects all elements of an interpretation for which the prototype distance function d returns at most distance t.

We give a concrete formalism for prototypical definitions for the DL \(\mathcal {ALC}\), which uses weighted alternating parity tree automata (wapta) to specify prototype distance functions. In order to show that \(\mathcal {ALCP} (\text {wapta})\), i.e., the DL \(\mathcal {ALC}\) extended with prototypes defined by wapta, is decidable, we first show how unweighted automata can be used to decide concept satisfiability in \(\mathcal {ALC}\). Afterwards, we present a cut-point construction that computes an unweighted automata \(\mathcal {A}_{\le n}\) which recognize exactly the cut-point language of a wapta \(\mathcal {A}\) with threshold n, i.e., the language of all trees that have a distance of at most n. Finally, we show that one can combine the automaton used to decide concept satisfiability in \(\mathcal {ALC}\) with the cut-point version of the prototype distance automata in order to decide the concept satisfiability problem in \(\mathcal {ALCP}\). We show that, if numbers are encoded in unary, then reasoning in \(\mathcal {ALCP} (\text {wapta})\) is ExpTime-complete, the same as classical \(\mathcal {ALC}\) without prototypes [10].

5 Conclusions and Outlook

The three main contributions of our research are the concept similarity measure \(\sim _c\), the definition of relaxed instances, and an approach to define and reason with prototypes using weighted automata. The thesis [1] explains all contributions in more depth, and provides an in-depth discussion and comparison to related approaches. There are many directions in which this work can be expanded. While the use of more expressive DLs is always useful to investigate, we believe that an extension of the query language for relaxed instance queries, and an investigation of alternative semantics for the weighted automata in the prototype approach would be particularly worthwhile.