In this section, we will focus on more recent NLIs starting from 2005. We will not discuss about older systems like BASEBALL [21], LUNAR [64], RENDEZVOUS [9], LADDER [47], Chat-80 [61], ASK [57] and JANUS [62], which are often quoted in the research field of NLIs. We will systematically analyze 24 recently developed systems in Sects. 5.1–5.4 based on the sample world introduced in Sect. 2. The main goal is to highlight strengths and weaknesses of the different approaches based on a particular data model and particular questions to be able to directly compare the systems. The explanation is based on the information describing the systems found in the papers. Finally, in Sect. 6.1, we will evaluate the systems against the sample questions and give an overall interpretation of the system.
There are different ways to classify NLIs. In this survey, we divide the NLIs into four main groups based on the technical approach they use:
-
1.
Keyword-based systems
The core of these systems is the lookup step, where the systems try to match the given keywords against an inverted index of the base and metadata. These systems cannot answer aggregation queries like question Q7 ‘What was the best movie of each genre?’ The main advantage of this approach is the simplicity and adaptability.
-
2.
Pattern-based systems
These systems extend the keyword-based systems with NLP technologies to handle more than keywords and also add natural language patterns. The patterns can be domain-independent or domain-dependent. An example for a domain-independent pattern would be allowing aggregations with the words ‘by’ or ‘how many.’ A domain-dependent pattern could for example be a concept like ‘great movie.’
-
3.
Parsing-based systems
These systems parse the input question and use the generated information about the structure of the question to understand the grammatical structure. The parse tree contains a lot of information about single tokens, but also about how tokens can be grouped together to form phrases. The main advantage of this approach is that the semantic meaning can be mapped to certain production rules (query generation).
-
4.
Grammar-based systems
The core of these systems is a set of rules (grammar) that define the questions a user can ask the system. The main advantage of this approach is that the system can give users natural language suggestions during typing their questions. Each question that is formalized this way can be answered by the system.
Table 2 gives an overview of the most representative NLIs of these four categories that we will discuss in Sects. 5.1–5.4. The table also shows which kind of query languages the systems support (e.g., SQL) as well as which NLP technologies are used. In summary, our approach of systematically evaluating these systems on a sample world with queries of increasing complexity enables a better comparison of the different approaches.
In the next subsections, we will systematically analyze the systems in more detail.
Table 2 Categorization of the NLIs and the used NLP technologies Keyword-based systems
The core of keyword-based NLIs is their lookup step. In this step, the system tries to match the given keywords against an inverted index of the base and metadata. To identify keywords in the input question, some systems are using stop word removal (e.g., NLP-Reduce [32]), others are expecting only keywords from the users as input (e.g., SODA [6]).
Most questions are easily formulated with keywords. However, there are some cases where keywords are not enough to express the intention of the users. For example, for the question ‘What was the best movie of each genre?’ (Q7), the ‘keyword-only version’ would be something like ‘best movie genre,’ which is more likely to be interpreted as ‘the genre of the best movie.’ If the users would write the question like ‘best movie by genre,’ a keyword-based NLI would try to lookup the token ‘by’ in the base and metadata or classify ‘by’ as a stop word and ignore it.
In the following, we will summarize seven keyword-based NLI. We decided to describe SODA [6]—as the first system—in depth, because it can solve the most of our sample input questions in this category (see Sect. 2.2). SODA is an NLI that expects only keywords from the user and can handle aggregations by using specific non-natural language templates. Afterward, the other systems are summarized, and we highlight the difference between them to SODA and each other.
SODA (Search Over DAta warehouse)
SODA [6] is a system that provides a keyword-based NLI for relational databases with some extensions in the direction of a pattern-based system. The base data consists of the relational database. The metadata can include multiple ontologies, which are handled like natural language patterns. For example, domain-specific ontologies with concepts (like the concept ‘great movie’ in the sample world) or DBpedia to identify homonyms and synonyms. SODA uses both inverted indexes (base and metadata) as the basis for finding query matches in the data. The key innovation of SODA is that it provides the possibility to define metadata patterns which specify conceptual models. The concept ‘good movie’ could depend on various variables not only on the rating, but for example, also on the number of ratings. The users can then apply this concept to their input questions, for example, they could search for ‘all great movie’ (Q6) without having to specify what a great movie is.
Assuming the users want to know the director of the movie ‘Inglourious Basterds’ (Q1), the input question for SODA could be: ‘director Inglourious Basterds.’
SODA uses five steps to translate this keyword-based input question into a SQL query. The first step is the lookup: it checks the keywords against the inverted indexes over the database and provides all the nodes in the metadata graph where these keywords are found. For the input question Q1, this means that the keyword ‘director’ can be found in the inverted index of the meta data, either the table name Director or to the attribute name Director.directorId and Directing.director-Id (Fig. 7: red). The keyword ‘Inglourious Basterds’ is only found in the inverted index of the base data as a value of the attribute Movie.Title (Fig. 7: green). This leads to three different solution sets for the next steps: {Directing.directorId, Movie.Title}, {Director.directorId, Movie.Title} and {Director, Movie.Title}.
The second step is to assign a score to each solution of the lookup step. SODA uses a simple heuristic method, for example, in-domain solutions receive a higher score. For the input question, the solution {Director, Movie.Title} receives the highest score, because the table name Director is a full match and not only a fuzzy match like in directorId. Afterward, only the best n solutions are provided to the next step.
The third step identifies which tables are used for each of the solutions provided by the previous step. Also, the relationships and inheritance structures between those tables are discovered in this step. For the best solution of the input question, the tables Director and Movie correspond to the different entry points. An entry point is a node in the metadata graph. The table Director is a child of the table Person (ISA-relationship). Therefore, SODA includes the table Per-son in the solution. To link the table Movie to the other two tables, it is necessary to add more tables to the solution. The closest link is through the table Directing (see Fig. 7), and therefore this table is included.
The fourth step collects the filters. There are two types of filters which are collected. The first one are filters in the input question like ‘Inglourious Basterds.’ The second one are filter conditions that occur during traversing the metadata graph like the concept ‘great movie.’
The fifth and last step generates a reasonable and executable SQL query from the information collected in the previous steps. A reasonable SQL query is a query which considers foreign keys and inheritance patterns in the schema. An executable SQL query is a query that can be executed on the underlying database.
The strengths of SODA are the use of metadata patterns and domain ontologies, which allow one to define concepts and include domain-specific knowledge. In addition, the inclusion of external sources like DBpedia for homonyms and synonyms is beneficial for finding meaningful results. Furthermore, SODA is designed to evolve and thus improve over time based on user feedback.
The weaknesses of SODA are that it uses simple word recognition for comparison operators. For example, to retrieve all movies with a rating greater than 9, the input question needs to be written like ‘rating > 9’ (Q2). Moreover, SODA uses a very strict syntax for aggregation operators. For example, to retrieve the number of movies per year, the input question needs to be written like ‘select count (movie) group by (year).’ These patterns are useful, but are not in natural language. Furthermore, there is no lemmatization, stemming or any other preprocessing of the input question which can lead to a problem with words that are used in plural. For example the input question ‘all movies’ would not detect the table Movie but the input question ’all movie’ would display the expected result.
Blunschi et al. [6] suggest extending SODA to handle temporal aspects of the data warehouse (e.g., bi-temporal historization). They also pointed out that the GUI of SODA should be improved so that the users are engaged in selecting and ranking the different results. Furthermore, the user feedback provided by SODA is currently very basic and needs to be improved.
NLP-reduce
NLP-Reduce [32] uses a few simple NLP technologies to ‘reduce’ the input tokens before the lookup in the Knowledge Base (KB) based on RDF. The system takes the input question, reduces it to keywords, translates it into SPARQL to query the KB and then returns the result to the user.
In contrast to SODA, NLP-Reduce uses not only synonyms but also two other NLP technologies: (a) stop words and punctuation marks removal and (b) stemming. With the removal of stop words and punctuation marks, NLP-Reduce is able to answer some fragment and full sentence questions. NLP-Reduce still cannot answer questions with aggregations like ‘What was the best movie of each genre?’ (Q7), because it will remove the token ‘of each’ as a stop word. Furthermore, stemming helps the user to formulate questions like ‘all movies’ which is more natural than ‘all movie’ for SODA.
After reducing the input question, NLP-Reduce has similar steps to SODA: (1) search for triples in the RDF graph (similar to base and metadata), where at least one of the question tokens occurs and rank the found triples, (2) search for properties that can be joined with the triples found in the previous step by the remaining question tokens, (3) search for datatype property values that match the remaining question tokens and (4) generate corresponding SPARQL query.
Compared to SODA, the strength of NLP-Reduce is the reduction in the input question such that non-keyword input questions can be answered. Besides the overall weakness of keyword-based NLIs, NLP-Reduce is not able to answer comparison questions like Q2.
Précis
Précis [51] is a keyword-based NLI for relational databases, which supports multiple terms combined through the operators AND, OR and NOT. For example, input question ‘Show me all drama and comedy movies.’ (Q5) would be formulated as ‘“drama” OR “comedy”.’ The answer is an entire multi-relation database, which is a logical subset of the original database.
First, Précis transforms the input question into disjunct normal form (DNF). Afterward, each term in the DNF is looked up in the inverted index of the base data. This is different to SODA, where the inverted index includes the metadata. If a term cannot be found, the next steps are not executed. The third step creates the schema of the logical database subset, which represents the answer of the input question. This includes the identification of the necessary join paths.
The strength of Précis is the ability to use brackets, AND, OR and NOT to define the input question. However, the weaknesses are that this again composes a logical query language, although a simpler one. Furthermore, it can only solve Boolean questions, and the input question can only consist of terms which are located in the base data and not in the metadata. For example, the input question ‘Who is the director of “Inglourious Basterds”?’ (Q1) cannot directly be solved because ‘director’ is the name of a table and therefore part of the metadata. There is a mechanism included that adds more information to the answer (e.g., the actors, directors etc., to a movie), but then the user would have to search for the director in the answer.
QUICK (QUery Intent Constructor for Keywords)
QUICK [68] is an NLI that adds the expressiveness of semantic queries to the convenience of keyword-based search. To achieve this, the users start with a keyword question and then are guided through the process of incremental refinement steps to select the question’s intention. The system provides the user with an interface that shows the semantic queries as graphs as well as textual form.
In a first step, QUICK takes the keywords of the input question and compares them against the KB. Each possible interpretation corresponds to a semantic query. For example, the input question ‘Brad Pitt’ can either mean ‘movies where Brad Pitt played in,’ ‘movies directed by Brad Pitt’ or ‘movies written by Brad Pitt’ (see Fig. 1). In the next step, the system provides the users with the information in such a way that they can select the semantic query which will answer their question. To do so, QUICK provides the users with possible interpretations of each keyword to select from. This is done with a graph as well as a textual form of the semantic query. The textual form is a translation of the SQL query into natural language based on templates. Furthermore, the system orders the keywords in such a way that the user interactions are as few as possible. When the users select the desired semantic query, QUICK executes it and displays the results in the user interface.
The strength of QUICK is the user interaction interface with the optimization for minimal user interaction during the semantic query selection. The weakness of QUICK is that it is limited to acyclic conjunctions of triple patterns.
QUEST (QUEry generator for STructured sources)
QUEST [4] is a keyword-based NLI to translate input questions into SQL. It combines semantic and statistical ML techniques for the translation.
The first step is to determine how the keywords in the input question correspond to elements of the database (lookup). In contrast to SODA, QUEST uses two Hidden Markov Models (HMM) to choose the relevant elements (ranking). The first HMM is a set of heuristic rules. The second HMM is trained with user feedback. The next step is to identify the possible join paths to connect all the relevant elements from the previous step. QUEST selects the most informative join paths (similar to SODA’s step 4). The most informative join paths are those that contain tuples in the database. In the third step, QUEST decides which combination of keyword mapping and join path most likely represents the semantics the users had in mind when formulating the keyword question.
The strength of QUEST is the combination of user feedback and a set of heuristic rules during the ranking. This allows the system to learn from the users over time. A weakness of QUEST is that it is not able to handle concepts such as ‘good movie.’
SINA
SINA [50] is a keyword-based NLI that transforms natural language input questions into conjunctive SPARQL queries. It uses a Hidden Markov Model to determine the most suitable resource for a given input question from different datasets.
In the first step, SINA reduces the input question to keywords (similar to NLP-Reduce), by using tokenization, lemmatization and stop word removal. In the next step, the keywords are grouped into segments, with respect to the available resources. For example, the keywords ‘Inglourious’ and ‘Basterds’ would be grouped into one segment based on the match for ‘Inglorious Basterds.’ In the third step, the relevant resources are retrieved based on string matching between the segments and the RDF label of the resource. In the following step, the best subset of resources for the given input question is determined (ranking). The fifth step, SINA constructs the SPARQL query using the graph-structure of the database. Finally, the results, retrieved by evaluating the generated SPARQL query, are shown to the users.
The biggest weakness of SINA is that it can only translate into conjunctive SPARQL queries, which reduce the number of answerable questions.
Aqqu
Aqqu [2] is an NLI which uses templates to identify possible relations between keywords. At the end of the translation process, ML is used to rank the possible solutions.
To translate the input question into SPARQL, first the entities from the KB that match (possibly overlapping) parts of the input question are identified. The possible parts are identified by using PoS tags. For example, single token parts must be a noun (NN) and proper nouns (NNP) are not allowed to be split (e.g., ‘Brad Pitt’). In the next step, Aqqu uses three different templates which define the general relationship between the keywords. Afterward, Aqqu tries to identify the corresponding relationship. This can either be done with the help of the input question (verbs and adjectives), or with the help of ML which for example can identify abstract relationship like ‘born\(\rightarrow \)birth date.’ The last step is the ranking which is solved with ML. The best result is achieved by using a binary random forest classifier.
The strength of Aqqu is the identification of abstract relationships. The weakness is the limitation of a keyword-based NLI.
Pattern-based systems
Pattern-based NLIs are an extension of keyword-based systems with natural language patterns to answer more complex questions like concepts (Q6) or aggregations (Q7). For example, the question ‘What was the best movie of each genre?’ (Q7) cannot be formulated with keywords only. It needs at least some linking phrase between ‘best movie’ and ‘genre,’ which indicates the aggregation. This could be done with the non-keyword token (trigger word) ‘by’ for the aggregation, which will indicate that the right side includes the keywords for the group by-clause and the left side the keywords for the select-clause. The difficulty with trigger words is to find every possible synonym allowed by natural language. For example, an aggregation could be implied with the word ‘by’ but also ‘of each’ (compare Q7).
In the following, we summarize two pattern-based NLIs. We decided to describe NLQ/A [69] in depth, because it is based on the idea that the errors made by NLP technologies are not worth the gain of information. Instead, the system is highly dependent on the users’ input to solve ambiguity problems, and therefore it focuses on the optimization of the user interaction.
NLQ/A
NLQ/A [69] is an NLI to query a knowledge graph. The system is based on a new approach without NLP technologies like parsers or PoS taggers. The idea being that the errors made by these technologies are not worth the gain of information. For example, a parse tree helps for certain questions like subqueries (e.g., Q9), but if the parse tree is wrong, the system will fail to translate even simpler questions. Instead, NLQ/A lets the users resolve all ambiguity problems, also those which could be solved with PoS tagging or parse trees. To avoid needing too many interaction steps, NLQ/A provides an efficient greedy approach for the interaction process.
Assuming the users want to know the director of the movie ‘Inglourious Basterds’ (Q1), the input question could be: ‘Who is the director of “Inglourious Basterds”?.’
NLQ/A use four steps to answer the input question. The first step is to detect the phrases of the input question. In general, the phrases can be categorized into two types: independent and dependent phrases. Independent phrases are identified with a phrase dictionary. The dictionary consists of variables, aggregations, operators, modifiers and quantifier phrases. To detect dependent phrases, most stop words are removed (simplified input question). Some types of words like prepositions are still needed and therefore kept. Next 1:n-grams are generated. Phrases starting with prepositions are discarded. After stop word removal, the input question Q1 would become ‘director of Inglourious Basterds.’ If n is set to 2, the extracted phrases would be: {‘director,’ ‘director of,’ ‘Inglourious,’ ‘Inglourious Basterds,’ ‘Basterds.’} Next, the phrases are extended according to a synonym dictionary. For example if there is a phrase ‘starring,’ it would be extended with the phrase ‘playing.’ Those extended phrases are mapped to the knowledge graph based on the string similarity (edit distance). For one extended phrase, there can be multiple candidate mappings.
The next step takes the candidate mappings and tries to find the true meaning of the input question with the help of the users. To reduce the amount of interactions for the user, a phrase dependency graph (PDG) is proposed. The PDG consists of two parts: (PDG1) a graph where each node represents a phrase, two phrases are connected if they share at least one common token and (PDG2) a subgraph of the knowledge graph consisting of the candidates where each node represents a candidate, two nodes are connected if they are adjacent in the knowledge graph. The two parts are connected with edges, representing the mapping between phrases and candidates (see Fig. 8).
In the third step, the users get involved to solve the ambiguity given in the PDG. In order to reduce the necessary user interactions, the NLI tries to find those edges which resolve the most ambiguities (similar to the idea of QUICK).
The last step takes the selected candidates and tries to connect them into one graph. The connected graph will include the answer to the question. Groups of already connected candidates in the PDG2 are called query fragments. In Fig. 8, the candidates ‘director-Of’ and ‘Inglourious Basterds’ are one query fragment. For each query fragment, the system tries to find the path with the highest similarity to the simplified input question. For the input question Q1, if the users select ‘Director’ as candidate in step 3, the system would find the path as shown in Fig. 9. ‘Inglourious Basterds’ is also a candidate, but not selected by the users because there is no ambiguity to solve.
The strengths of this NLI are the simplicity and the efficient user interaction process. The simplicity allows easy adaption on new knowledge graphs and together with the user interaction process it overcomes the difficulties of ambiguity.
The weakness of this system is that usually more than one user interaction is needed to resolve ambiguities, in the experiments the average number of interactions was three [69].
QuestIO (QUESTion-based Interface to Ontologies)
QuestIO [11] is an NLI to query ontologies using unconstrained natural language. It automatically extracts human-understandable lexicalization from the ontology. Therefore, the quality of the semantic information in the ontology has to be very high to contain enough human-understandable labels and/or descriptions. For example, the attribute Movie.Release-Date would be extracted as ‘Release Date,’ which is a human-understandable label. In contrast, the attribute Movie.OriginalLang would result in ‘Original Lang,’ where the token ‘Lang’ is a shortened version for ‘Language’ and is not human-understandable.
QuestIO translates the input question with three steps: In the first step, the key concept identification tool identifies all tokens which refer to mentions of ontology resources such as instances, classes, properties or property values. This is similar to the dependent phrases of NLQ/A. In the next step, the context collector identifies patterns (e.g., key phrases like ‘how many’) in the remaining tokens that help the system to understand the query (similar to independent phrases of NLQ/A). The last step identifies relationships between the ontology resources collected during the previous steps and formulates the corresponding formal query. After executing the query, it will be sent to the result formatter to display the result in an user-friendly manner.
The automatic extraction of semantic information out of the ontology is both a strength and a weakness of QuestIO. It is highly dependent on the development of the human-understandable labels and descriptions, without them QuestIO will not be able to match the input questions to the automatic extracted information.
Parsing-based systems
Parsing-based NLIs are going a step further than previously discussed systems: they parse the input question and use the information generated about the structure of the question to understand the grammatical structure. For example, the grammatical structure can be used to identify the dependencies given by the trigger word ‘by’ in a question. This is needed for long-range dependencies which cannot be caught with simple natural language patterns. Furthermore, the dependency parser can help to handle the difficulty of verboseness. For example, the nominal modifier (nmod) could be used to identify aggregations.
In the following, we summarize eight parsing-based NLIs. We decided to describe ATHENA [48] in depth, because it can answer the most of the sample input questions. Furthermore, ATHENA uses the most NLP technologies, and the authors describe all the steps in depth. Afterward, the other systems are summarized, and we highlight the delta to ATHENA and previous systems.
ATHENA
ATHENA [48] is an ontology-driven NLI for relational databases, which handles full sentences in English as the input question. For ATHENA, ontology-driven means that it is based on the information of a given ontology and needs mapping between an ontology and a relational database. A set of synonyms can be associated with each ontology element. During the translation of an input question into a SQL query, ATHENA uses an intermediate query language before subsequently translating it into SQL.
Assuming the users want to know the director of the movie ‘Inglourious Basterds’ (Q1), the input question for ATHENA could be: ‘Who is the director of “Inglourious Basterds”?.’
ATHENA uses four steps to translate a full sentence input question into a SQL query. In the first step, the ontology evidence annotator is used, which maps the input to a set of ontology elements. There are five types of possible matches:
-
a.
metadata: Finding a match in the inverted index for the metadata (and the associated set of synonyms). Longer matches over the input question are preferred if there are multiple matches.
-
b.
translation index: The translation index is an extension of the inverted index over the base data, which is enriched with variations for person and company names. For example, for the person name ‘Brad Pitt,’ there would also be an entry ‘B. Pitt.’
-
c.
time range expressions: Finding all time ranges like ‘from 2000 until 2010’ (Q5) with the TIMEX annotator. Those time ranges are then matched to the ontology properties with the corresponding data type.
-
d.
numeric expressions: Finding all tokens that include numeric quantities with the Stanford Numeric Expressions annotator. The numeric quantities can be either in the form of numbers (e.g., 9) or in text form (e.g., nine). Those numeric expressions are then matched to the ontology properties with the corresponding datatype.
-
e.
dependencies: Annotating dependencies between tokens in the input question. For example, in the input question Q1, there is a dependency between the tokens ‘director’ and ‘Inglourious Basterds’ indicated by the token ‘of.’
For the input question Q1, the metadata annotation will detect three different matches for ‘director,’ namely the table name Director and the attribute name Director.directorId and Directing.director-Id (Fig. 10: red). The translation index will find a match for the bi-gram ‘Inglourious Basterds,’ corresponding to the attribute Movie.Title (Fig. 10: green).
The next step generates a ranked list of interpretations. An interpretation is a set of ontology elements provided by the previous step. If n ontology elements exist for one token, there will also be n different interpretations, one for each ontology element. For the given input question, there are three different interpretations possible: {Directing.directorId, Movie.Title}, {Director.directorId, Movie.Title} and {Director, Movie.Title}. Each interpretation is represented by a set of interpretation trees. An interpretation tree (iTree) is a subtree of the ontology. Each iTree must satisfy:
-
a.
evidence cover: All tokens, which were annotated in the previous step, need to be covered.
-
b.
weak connectedness: All concepts need to be at least weakly connected through an undirected path and each property must be connected to its corresponding concept. For the first interpretation this means that Director and Movie need to be connected, for example, via the relation Directing. The attribute Title needs to be connected with the corresponding concept (in this case the table) Movie.
-
c.
inheritance constraint: No ontology element is allowed to inherit from its child concepts. For example, the ontology element Person is not allowed to inherit Role of Actor. The other direction is allowed, such that Actor inherits FirstName and LastName from Person.
-
d.
relationship constraint: All relationships given in the input question are included, not depending on the direction of the path. For example, the tree tokens ‘movies,’ ‘starring’ and ‘Brad Pitt’ (Q5) imply a relationship constraint between the ontology element Movie, Starring and Person. Those three ontology elements need to be connected. Accordingly, in this example, the ontology element Actor needs to be included.
For the purpose of ranking the different interpretations, ATHENA generates one single iTree. It can consist of a union of multiple iTrees or a single iTree. Figure 11 shows a possible iTree for the interpretation {Director, Movie.Title}, which is extended with the ontology element Directing and Movie. After this step, for each interpretation only one iTree is left.
The third step uses the ranked list of interpretations to generate an intermediate query in the Ontology Query Language (OQL). OQL was specifically developed for ATHENA to be an intermediate language between the input question and SQL and is able to express queries that include aggregations, unions and single nested subqueries. The structure of an OQL query is similar to SQL and is generated as follows:
-
a.
fromclause: Specifies all concepts found in the ontology along with their aliases. The aliases are needed, for example, if a concept occurs multiple times. For example, the input question ‘Show me all drama and comedy movies.’ (Q4) would point to Genre in the ontology twice: once for the token ‘drama’ and once for ‘comedy.’ Therefore, two aliases are needed to distinguish between them.
-
b.
group byclause: The group by clause is triggered by the word ‘by’ and only tokens annotated with metadata in step 1.a are considered. For example, the input question ‘What was the best movie by genre?’ (modified Q7). To identify the dependencies between dependent and dependee (illustrated by the ‘by’), the Stanford Dependency Parser is used.
-
c.
selectclause: There are two possible types: aggregation and display properties. The aggregation properties depend on the group by clause. The default aggregation function is sum. For the (modified) input question Q7, ATHENA would detect a group by clause because the ‘by genre’ needs an aggregation function. Assuming ATHENA can translate ‘best movie’ to mean ‘best ranked movie,’ it would apply the aggregation function max on Movie.Rating. If there are no aggregations, ATHENA uses the tokens annotated with metadata as display properties, which are shown to the user.
-
d.
order byclause: Properties used in the order by clause are indicated by tokens like ‘least,’ ‘most,’ ‘ordered by,’ ‘top’ and others. For example, the input question ‘Which movie has grossed most?’ (Q3) would trigger an order by clause for Movie.Gross because of the trigger word ‘most.’
-
e.
whereclause: Tokens annotated with the translation index, time range or numerical expression are used in the where clause to filter the result (e.g., the tokens ‘Inglourious Basterds’). If the filter is applied on an aggregation, a having clause is generated instead of the where clause.
The final step translates the OQL query into a SQL query, where each attribute and join condition is either concrete or virtual. Concrete means that a direct mapping between ontology and the database (e.g., ‘director’) exist. Virtual implies that a (complex) relationship between the ontology elements and the database (e.g., ‘great movie’) exists. Furthermore, the result of the best ranked interpretation is directly displayed, but the users will see the other interpretations as well. All the top n interpretations that ATHENA has found are translated back into full sentences in English for the users, so that the users can choose the best fitting one.
The strengths of ATHENA are the ontology as an abstraction of the relational database and the natural language explanation for each interpretation of the input question. The translation index contains not only synonyms but also semantic variants for certain types of values like persons and company names. Furthermore, ATHENA can handle single-level nesting in the input question. An example input question could be ‘All movies with rating higher than the rating of “Sin City”.’ (Q9).
One of the weaknesses of ATHENA is that neither negation (Q8) nor multiple elements in the group by clause (e.g., ‘What was the best movie by year andgenre?’) are supported.
Saha et al. [48] suggest extending ATHENA to handle more than single-level nesting. Furthermore, they suggest enabling a possibility to answer follow-up questions using the context of the previous questions.
Querix
QuerixFootnote 5 [31] allows users to enter questions in natural language to query an ontology. If the system identifies any ambiguities in the input question, it asks the user for clarification in a dialog. Querix uses a syntax tree to extract the sequence of words from the main word categories: noun (N), verb (V), preposition (P), wh-pronoun (Q, e.g., what, where, when, etc.,) and conjunction (C). This sequence is called query skeleton. The query skeleton is used to enrich nouns and verbs and to identify subject-property-object patterns in the query.
In contrast to ATHENA, which uses a lot of different tools and technologies, Querix only uses the information of the query skeleton (parse tree) and the synonyms (for both the input question and the ontology) to translate the input question into SPARQL. For translating into SPARQL, Querix uses three components: query analyzer, matching center and query generator. The query analyzer handles two tasks: (1) It applies the Stanford Parser on the input question to generate a syntax tree, from which Querix extracts the query skeleton. For example, the query skeleton ‘Q-V-N-P-N’ is extracted from the input question (Q1) as ‘Who (Q) is (V) the director (N) of (P) “Inglourious Basterds” (N) ?’. (2) It enriches all nouns and verbs with synonyms provided by WordNet.
The matching center is the core component of Querix: (1) It tries to match the query skeleton with a small set of heuristic patterns. Those patterns are used to basically identify subject-property-object patterns in the input question. (2) It searches for matches between nouns and verbs of the input question with the resources in the ontology (including synonyms). (3) It tries to match the results of the two previous steps. The query generator then composes SPARQL queries from the joined triplets delivered by the last step of the matching center. If there are several different solutions with the highest cost score, Querix will consult the user by showing a menu from which the user can choose the intended meaning.
The simplicity of Querix is both a strength and a weakness: it is simple to use and completely portable, but this simplicity also reduces the number of questions that can be answered since they have to adhere to a predefined syntax.
FREyA (Feedback, Refinement and Extended vocabularY Aggregation)
FREyA [12] is based on QuestIO. It allows users to enter queries in any form in English. It generates a syntactic parse tree in order to identify the answer type. The translation process starts with a lookup, annotating query terms with ontology concepts by using heuristic rules. If there are ambiguous annotations, the user will be engaged in a clarification dialogue. The user’s selections are saved and used for training the system in order to improve its performance over time. In the end, the system generates a SPARQL query.
The strength of this system is the user interaction, which not only supports the users to find the right answer, but also improves FREyA over time. Furthermore, FREyA performs an answer type identification, which leads to more precise answers. The weakness is that only a few of the questions could be answered without any clarification dialogs. Furthermore, the system cannot answer negations.
BELA
BELA [58] is an NLI with a layered approach. This means, that at each layer the best hypothesis is determined. If the confidence for a particular interpretation is 1 and the SPARQL query generated by it produces an answer with at least one result, the translation process is stopped and the answer is returned to the user. Only for ASK-questions (which have yes/no answers), the process continues until the confidence of the interpretations start to differ, then a threshold of 0.9 is applied and an empty result (which equals a no-answer) is also accepted.
Similar to Querix, BELA parses the input question and produces a set of query templates, which mirror the semantic structure. The next step is a lookup in the index, including Wikipedia redirects (this corresponds to the translation index of ATHENA). The first lookup is without fuzzy matching, if no result can be found, a threshold of 0.95 is applied for a matching with normalized Levenshtein distance. The third lookup enables the usage of synonyms, hypernyms and hyponyms from WordNet. The last lookup (and step) uses Explicit Semantic Analysis, which can be used to relate expressions like ‘playing’ to concepts like ‘actor.’
Compared to other systems, BELA not only focuses on solving the translation task, but also reduces the computation time, which increases the user-friendliness.
USI Answers
USI Answers [59] is an NLI for semi-structured industry data. It offers natural language access to the large bodies of data that are used in the planning and delivery of services by Siemens Energy. The database consists of multiple databases, for example domain-specific ontologies, different relational databases and SPARQL endpoints. These can be either internal or external knowledge (e.g., DBpedia). Furthermore, the users demanded to be able to use not only natural language questions and formal query language constructs but also keyword questions or a mixture of these.
In contrast to the previous parsing-based NLIs, USI Answers cannot rely on the parse tree, because of the different types of possible input (e.g., keyword questions). Still, the first step includes various NLP technologies, such as lemmatization, PoS tagging, named entity recognition and dependency parsing. To handle domain-specific input terms, there is a list of syntax rules to revalidate the entity information after the NLP technologies were applied. Furthermore, the question focus (i.e., referenced entity object) is identified by applying 12 syntactic rules. Enriched with this information, the next step aims to identify and resolve the different concepts that may be in the input question (lookup). Next, USI Answers detects and prevalidates the different relations and objects found in the previous step. The fourth step generates different question interpretations, including how concepts and instances are connected to each other. Afterward, the different interpretations are validated and ranked. This is done with a learned model, based on user-defined views. The last step constructs the final query in the representative query language.
One strength of USI Answers is the ability to query different database resources. Furthermore, the users are free to choose their preferred type of input questions.
NaLIX (Natural Language Interface to XML)
NaLIX [38] is an interactive NLI for querying an XML database with XQuery. The interaction is based on guiding the users to pose questions that the system can handle by providing meaningful feedback and helpful rephrasing suggestions. Furthermore, NaLIX provides templates and question history to the users. It preserves the users prior search efforts and provides the users with a quick starting point when they create new questions.
Similar to Querix, NaLIX mostly uses the parse tree for the translation of an input question into XQuery. After MiniParFootnote 6 is used to parse the input question, NaLIX identifies the phrases in the parse tree of the input question that can be mapped to XQuery components and classifies them. In the next step, NaLIX validates the classified parse tree from the previous step: it checks if it knows how to translate the classified parse tree into XQuery and if all attribute names and values can be found in the database. The last step translates the classified parse tree into an appropriate XQuery expression if possible. During both the classification and the validation of the parse tree, information about errors (e.g., unknown phrases and invalid parse trees) is collected to report to the users for clarification.
The strength of NaLIX is the ability to solve difficult questions, which can include subqueries, by using and adjusting the parse tree. On the other hand, the reliance on a parse tree is also a weakness, because the system can only answer questions that are parseable.
NaLIR (Natural Language Interface for Relational databases)
NaLIR [35, 36] is a further development of NaLIX. Instead of translating into XQuery to query XML databases, NaLIR translates the input question into SQL and queries relational databases. NaLIR returns not only the result to the user, but also a rephrased version of the input question based on the translation.
The basic idea remains the same: parse the input question using the Stanford Parser and map the parse tree to SQL (instead of XQuery). The steps are similar with some modifications: In the step of mapping phrases of the input question parse tree to SQL components, the users are asked for clarification if there are ambiguities. The next step is not a validation anymore, but an adjustment of the parse tree in such a way that it is valid. The candidate interpretations produced by this adjusted and valid parse tree are delivered to the users to select the correct interpretation. The last step translates the adjusted and valid parse tree which the user has selected into SQL.
The strength of NaLIR is the interaction with the user, which improved further compared to NaLIX. The weakness remains: it is highly dependent on the parse tree.
BioSmart
BioSmart [27] uses a syntactic classification of the input question to translate the natural language question into a declarative language such as SQL. The system divides the input questions into three query types (iterative, conditional or imperative) using the parse tree of the input question and compare it to parse tree templates for each query type. For example, a imperative query consists of a verb (VB) followed by a object (NP). A more expressive and therefore complex input question can be built by nesting simple query types arbitrarily.
Similar to Querix, BioSmart uses the Stanford parser to parse the input question. The system then tries to map the resulting parse tree to predefined questions or to one of the query templates (query type identification). As mentioned, it is possible that a question consists of several of these templates to capture the meaning of the question. Then, the tables and possible joins that are needed to compute the query are identified. Afterward, the templates are transformed into a logical query using the information about the tables and joins.
Compared to other NLI, the strength of BioSmart is the possibility to query arbitrary databases. The weakness of BioSmart is the mapping to the three query types: if the system cannot match the input question to those query types, it is not able to answer the question.
Grammar-based systems
Grammar-based NLIs use a different approach. The core of these systems is a set of rules (grammar) that define the questions that can be understood and answered by the system. Using those rules, the system is able to give the users suggestions on how to complete their questions during typing. This supports the users to write understandable questions and gives the users insight into the type of questions that can be asked. The disadvantage of these systems is that they are highly domain-dependent: the rules need to be written by hand.
In the following, we summarize seven grammar-based NLIs. We decided to describe TR Discover [52] in depth, because it is well-described in the paper such that we can provide examples for the whole process, including the grammar rules. Furthermore, it uses the rules to guide and help the users during formulating their questions. Afterward, other grammar-based systems are summarized, and we describe the delta to TR Discover. The differences can be quite significant, however, they all have the same core: a set of rules.
TR Discover
TR Discover [52] is a system providing an NLI which translates the input question in form of an English sentence (or sentence fragment) into SQL or SPARQL. It is either used for relational databases or ontologies, but does not need an ontology to work for relational databases. During the translation steps, TR Discover uses a First Order Logic (FOL) representation as an intermediate language. Furthermore, it provides auto-suggestions based on the user input. There are two types of suggestions: auto-completion and prediction.
TR Discover helps the users in formulating the question through an auto-suggestion feature. For example, assuming the users want to know the director of the movie ‘Inglourious Basterds’ (Q1). When the users start typing ‘p,’ TR Discover will not only suggest ‘person’ but also longer phrases like ‘person directing’ (autocomplete). After ‘person directing’ is selected (or typed), TR Discover will again suggest phrases, like ‘movies’ or even specific movies like ‘Inglourious Basterds’ (prediction). For input question Q1, the input could be ‘person directing Inglourious Basterds.’
The suggestions are based upon the relationships and entities in the dataset and use the linguistic constraints encoded in a feature-based context-free grammar (FCFG). The grammar consists of grammar rules (G1-3) and lexical entries (L1-2). For the sample world (and the input question Q1), the following rules could be defined:
- G1::
-
NP
\(\rightarrow \)
N
- G2::
-
NP
\(\rightarrow \)
NP VP
- G3::
-
VP
\(\rightarrow \)
V NP
- L1::
-
N[TYPE=person, NUM=sg, SEM=<x.person(x)>]\(\rightarrow \)person
- L2::
-
V[TYPE=[person,movie,title], SEM=<X x.X(y.directMovie(y,x)>, TNS=presp]\(\rightarrow \)directing
The suggestions are computed based on the idea of left-corner parsing: given a query segment, it finds all grammar rules whose left corner on the right side matches the left side of the lexical entry of the query segment. Then, all leaf nodes (lexical entries) in the grammar that can be reached by using the adjacent element are found. For example, while typing ‘person’ (Q1), the lexical entries L1 and L2 are found and provided to the user.
TR Discover uses three steps to translate the English sentence or fragment of a sentence into a SQL or SPARQL query. The first step parses the input question into a FOL representation. The query parsing uses the FCFG. For the example input, the token ‘person’ will be parsed by the lexical entry L1 and the token ‘directing’ will be parsed with the lexical entry L2. This leads to the FOL representation:
x.person(x)
\(\rightarrow \)
directMovie(y,x)& type(y,Movie) & label(y, ‘Inglourious Basterds’)
How exactly the phrase ‘Inglourious Basterds’ is matched to the base data and therefore can be used as part of the lexical entry L2 and how it is resolved is not explained by Song et al. [52]. If there are multiple possibilities to parse the input question, the first one is chosen.
The next step is to translate the generated FOL into a parse tree. The FOL parser takes a grammar and the FOL representation from the previous step, and generates a parse tree (Fig. 12) using ANTLER for implementation.
In the third step, an in-order traversal of the parse tree (provided by the previous step) is performed to translate it into an executable SQL or SPARQL query. While traversing the parse tree, the atomic logical conditions and connectors are put on a stack. After traversing, the constraints are popped from the stack to build the correct query constraints. The predicates are mapped to their corresponding attribute names (SQL) or ontology properties (SPARQL).
The strengths of TR Discover are the auto-suggestion and the possibility to translate natural language into different query languages such as SQL and SPARQL, because FOL is used as an intermediate language.
The weaknesses of TR Discover are that quantifiers (e.g., Q3: ‘grossed most’) cannot be used, synonyms are not properly handled, and negations only work for SPARQL.
Song et al. [52] suggest extending TR Discover with a ranking system for multiple parses in the first step and to improve the synonym handling. Furthermore, they pointed out the possibility of applying user query logs to improve auto-suggestions.
Ginseng (Guided Input Natural language Search ENGine)
Ginseng [5] is a guided input NLI for ontologies. The system is based on a grammar that describes both the parse rules of the input questions and the query composition elements for the RDF Data Query Language (RDQL) queries. The grammar is used to guide the users in formulating questions in English.
In contrast to TR Discover, Ginseng does not use an intermediate representation and therefore the parsing process translates directly into RDQL. The grammar rules are divided in two categories: dynamic and static grammar rules. The dynamic grammar rules are generated from the OWL ontologies. They include rules for each class, instance, objects property, data type property and synonyms. The static grammar rules consist of about 120 mostly empirically constructed domain-independent rules, which provide the basic sentence structures and phrases for input questions. The naming conventions used by Ginseng differ slightly from these used by TR Discover. Ginseng’s dynamic rules correspond to TR Discover’s lexical rules and Ginseng’s static rules consist of both grammar and lexical rules in TR Discover.
The strength of Ginseng are the dynamic rules which are generated from the ontology. This, together with the domain-independent static rules, leads to an easier adaptability compared to systems like TR Discover. The weakness lies in the grammar rules: they need to cover all possible types of questions the users want to ask.
SQUALL (Semantic Query and Update High-Level Language)
SQUALL [17, 18] is an NLI for searching and updating an RDF store. It uses the style of Montague grammars (context-free generative grammar) as an intermediate language (similar to TR Discover) to split the translation process in two parts: translating the natural language input question into a logical language and translating the logical language into a query language. Because of that the second part is getting easier: the logical language and the query language share the same semantics and level of detail. The grammar of SQUALL consists of about 120 domain-independent rules.
The translation into the logical form is done in three steps. In the first step, the keywords are recognized (lookup step). The second step is a syntactic analysis based on a descending parser, which is fed with the grammar rules. Afterward, the next step can generate the logical language based on the definition in the grammar. After the translation into the logical language, the translation in to the chosen formal language can be done.
The strength of SQUALL is that it is able to translate any type of input question, including aggregations, negations and subqueries. The weakness of SQUALL is that the users have to know the RDF vocabulary (e.g., classes and properties). For example, the input question Q1 needs to be formulated as ‘Who is the director ofInglourious_Basterds?’
MEANS (MEdical question ANSwering)
MEANS [3] is an NLI that uses a hybrid approach of patterns and ML to identify semantic relations. It is highly domain-dependent and focuses on factual questions expressed by wh-pronouns and Boolean questions in a medical subfield targeting the seven medical categories: problem, treatment, test, sign/symptom, drug, food and patient.
To translate the input question into SPARQL, MEANS first classifies the input question into one of ten categories (e.g., factoid, list, definition, etc.). If the question is categorized as a wh-question, the Expected Answer Type (EAT) is identified and replaced with ‘ANSWER’ as a simplified form for the next step. For example, the EAT of the input question Q1 would be ‘director.’ In the next step, MEANS identifies medical entities using a Conditional Random Field (CRF) classifier and rules to map noun phrases to concepts. The next step is used to identify seven predefined semantic relations. The annotator is a hybrid approach based on a set of manually constructed patterns and a Support Vector Machine (SVM) classifier.
The strength of MEANS is that it can handle different types of questions, including questions with more than one expected answer type and more than one focus. As for most of the grammar-based NLIs, MEANS suffers from the restriction based on the handcrafted rules. The inclusion of ML reduces this problem, but ML itself needs huge training corpus to be usable. Furthermore, comparison (and also negation) is not taken into account.
AskNow
AskNow [14] uses a novel query characterization structure that is resilient to paraphrasing, called Normalized Query Structure (NQS), which is less sensitive to structural variation in the input question. The identification of the elements in the NQS is highly dependent on POS tags. For example, the input question Q1 ‘Who is the director of “Inglourious Basterds”?’ would be matched to the NQS template:
[Wh][R1][D][R2][I], where [Wh] is the question word ‘Who,’ [R1] is the auxiliary relation ‘is,’ [D] is the query desire class ‘director,’ [R2] the relation ‘of’ and [I] is the query input class ‘Inglourious Basterds.’
To translate the input question into SPARQL, AskNow first identifies the substructures using a POS tagger and named entity recognition. Then, it fits the substructures into their corresponding cells within the generic NQS templates. Afterward, the query type (set, boolean, ranking, count or property value) is identified based on desire and wh-type. In the next step, the query desire, query input and their relations will be matched to the KB. As an example, Spotlight can be used for the matching to DBpedia. During the matching process, AskNow uses WordNet synonyms and a BOA pattern library (bootstrapping).
The strength of AskNow, compared to the previous grammar-based systems is that the users are free to formulate their questions without restrictions. In addition, the NQS templates allow complex questions which, for example, can include subqueries. One weakness of AskNow is that it highly depends on the right PoS tags and restricts the types of questions that can be asked.
SPARKLIS
SPARKLIS [19] is a guided query builder for SPARQL using natural language for better understanding. It guides the users during their query phrasing by giving the possibilities to search through concepts, entities and modifiers in natural language. It relies on the rules of SPARQL to ensure syntactically correct SPARQL queries all the time during the process. The interaction with the system makes the question formulation more constrained, slower and less spontaneous, but it provides guidance and safeness with intermediate answers and suggestions at each step. The translation process for SPARKLIS is reversed: it translates possible SPARQL queries into natural language such that the users can understand their choices.
The auto-completion is different from the previous systems (e.g., TR Discover and Ginseng): the interface displays three lists where the users can search for concepts, entities or modifiers. To ensure completeness relative to user input, SPARKLIS uses a cascade of three stages. The first stage is on client side, where a partial list of suggestion is filtered. The second stage is executed if the filtered list gets empty, then the suggestions is recomputed by sending the query, including the users filter, to the SPARQL endpoint. The last stage is triggered if the list is still empty, then, new queries are again sent to the SPARQL endpoint, using the full SPARQL query instead of partial results. Only a limited number of suggestions are computed and no ranking is applied, because of scalability issues.
The strength of SPARKLIS is also its weakness: the restricted guidance of the users during the query formulation process allows only syntactically correct questions, but at the same time, the users’ freedom is reduced. Furthermore, the limited number of suggestions has the negative consequence that they may be incomplete and therefore making some queries unreachable.
GFMed
GFMed [41] is an NLI for biomedical linked data. It applies grammars manually built with Grammatical FrameworkFootnote 7 (GF). GF grammars are divided into abstract and concrete grammars. The abstract grammar defines the semantic model of the input language, and for GFMed, this is based on the biomedical domain. The concrete grammars define the syntax of the input language, which is English and SPARQL. Furthermore, GF supports multilingual applications and because of that Romanian is included as a second natural language in GFMed.
To translate the controlled natural language input into SPARQL, GFMed relies in a first step on the libraries of GF for syntax, morphological paradigms and coordination. GFMed covers basic elements of SPARQL to support term constraints, aggregates and negations. There is no support for property paths of length different from 1, optional graph pattern or assignment. Furthermore, only equality and regular expression operators are included.
The strength of GFMed is that it covers the basic elements of SPARQL. Furthermore, it introduces a second natural language besides of English for the users to ask questions. The weakness are the GF grammars which are domain-dependent and restrict the number of questions that can be asked by the users.