In this section, we describe in detail the architecture and the different aspects of our knowledge retrieval framework. In the first subsection, we describe the dataset from which we took knowledge and fused in our schema. Next, we present the ontology that is the main component of our framework. In the last subsection, we describe the algorithm that semantically matches entities from DBpedia, ConceptNet, and WordNet, with entities in our KB.
3.1 Household Dataset
The VirtualHome dataset
[17, 23] contains activities that people do at home. For each activity, there are different descriptions on how to perform them. The descriptions are present in the form of sequence of actions, i.e., steps that contain an action related with an object or objects, illustrated in Example 1. Moreover, the dataset offers a virtual environment representation for each sequence of actions with UnityFootnote 2. The dataset contains \(\sim \)2800 sequences of actions, for human scaled activities. Moreover, the dataset holds more than 500 objects, usually found in a household environment, which are semantically connected with each other, and with specific human scaled actions.
Example 1
Browse Internet
Comment: walk to living room. look at computer. switch on computer. sit in chair. watch computer. switch off computer.
Each sequence of actions has a template: (a) Activity Label, (b) Comment, i.e. small description, and (c) the sequence of actions. Each step has the general form shown in (1):
$$\begin{aligned}{}[Action] \langle Object_1 \rangle (ID_1) \ldots \langle Object_n \rangle (ID_n) \end{aligned}$$
(1)
where Action is the human scaled action, \(Object_1,\ldots ,Object_n\) are the objects on which the action is performed \(\left( n \in \mathbb {N}\right) \), and \(ID_1,\ldots , ID_n\) are the unique identity numbers between the objects that represent the same natural object. In our experiments we have approximately 500 objects, but due to the fact that the ontology can be freely extended with objects, we consider n as a natural number.
3.2 Ontology
The main component of our knowledge retrieval framework is the ontology that was inspired by the VirtualHome dataset. Figure 1a presents part of the ontology concepts, while Fig. 1b the relationships between the major concepts.
The class Activity contains some subclasses which follow the hierarchy provided by the dataset; these were hand-coded. Moreover, the instances of these classes are the sequence of actions presented in the KB of the dataset. The class Activity is connected through the property listOfSteps with the class Step. Additionally, the class Step is connected through the properties object and step_type with the classes ObjectType and StepType, respectively. Next, the class ObjectType contains the labels of all the objects found in the sequences. On the other hand, the class StepType is similar to ObjectType as it gives natural language labels to the steps.
We have represented every sequence of actions as a list, because this gave us stronger coherency and interaction on the knowledge provided by the activity. Thus, we can answer queries like “What is the third step in the sequence of activity X?”, or “Return all the sequences where firstly I walk to the living room, then I open the TV, and after that I sit on the sofa”, information very crucial for a system with planning capabilities. Also, we have developed an instance generator algorithm that transforms the sequences of actions from the form of Example 1 into instances of classes in our ontology. The class that the sequence belongs to, is provided by the Activity label. We give such an instance in Example 2.
Example 2
Each step shown in the property listOfSteps is an instance of the class Step. Each step has a unique ID that distinguishes it from all the other steps. Example 3 shows an instance step from the listOfSteps, and Example 4 the object and action with which the instance is connected from the ObjectType and StepType classes.
Example 3
Example 4
After constructing and populating the ontology, we have developed a library in Python that constructs SPARQL queries addressed to the ontology and fetches answers. The library consists of 9 predefined query templates that represent the most probable question types to the household ontology. These templates were consider as more important after an extensive literature review of studies about cognitive robotic systems that act in a household environment
[9]. Among many other studies, we have considered primarily KnowRob
[2, 31], RoboSherlock
[1], RoboBrain
[29], and RoboCSE
[7]. We managed to find what were the most common and crucial queries addressed to a cognitive robotic system and we constructed these templates based on these findings. Example 5 shows the SPARQL template that returns the objects which are related to two other objects, Object1 and Object2.
Example 5
Alternatively, ad-hoc SPARQL queries can be asked to the ontology, such as Example 6 were an user wants to see the objects involved in the activity, activity1.
Example 6
Therefore, users can hand pick one of the predefined queries and then give the keywords that are needed in order to fill the SPARQL template (Example 5), or they can write their own SPARQL query to access the information they desire (Example 6).
3.3 Semantic Matching Algorithm
Due to the fact that the dataset upon which the knowledge retrieval framework was constructed has a finite number of objects, in order to be able to retrieve knowledge about objects on a larger scale, we developed a mechanism that can take advantage of the web knowledge graphs DBpedia, ConceptNet, and WordNet to answer queries about objects that do not exist in our KB. This would broaden the range of queries that the framework can answer, and would overcome the downside of our framework being dataset oriented. Algorithm 1 was implemented using Python. The libraries Request and NLTKFootnote 3 offer web APIs for all three aforementioned ontologies. Similar methods can be found in
[12, 35], where they also exploit the CS knowledge existing in web ontologies. Algorithm 1 starts by getting as input any word that is part of the English language; we check this by obtaining the WordNet entity, line 3. The input is given by the user implicitly, when he gives a keyword in a query that does not exist in the KB of the framework.
Subsequently, we turn to ConceptNet, and we collect the properties and values for the input word, line 4. In our framework, we collect only the values of some properties such as RelatedTo, UsedFor, AtLocation, and IsA. We choose these properties because they are the most related to our target application of providing information for household objects. Also, we acquire the weights that ConceptNet offers for each triplet. These weights represent how strong the connection is between two different entities with respect to a property in the ConceptNet graph, and are defined by the ConceptNet community. Therefore, we end up with a hash map of the following form:
$$\Big \{Property_1:\left[ \left( entity^{1}_1, weight^{1}_1\right) , \ldots ,\left( entity^{1}_m, weight^{1}_m\right) \right] ,\ldots ,$$
$$Property_l:\left[ \left( entity^{l}_1, weight^{l}_1\right) , \ldots ,\left( entity^{l}_k, weight^{l}_k\right) \right] \Big \}$$
for \(m,l,k \in \mathbb {N}\backslash \{0\}\).
Then, we start extracting semantic similarity between the given entity and the returned property values using WordNet and DBpedia, lines 5–8. Firstly, we find the least common path that the given entity has with each returned value from ConceptNet, in WordNet, line 9. The knowledge in WordNet is in the form of a direct acyclic graph with hyponyms and hypernyms. Thus, in each case we obtain the number of steps that are needed to traverse from one entity to another. Subsequently, we turn to DBpedia to extract comment boxes of each entity using SPARQL, lines 11–13. If DBpedia does not return any results, we search the entity in Wikipedia, which has a better search engine, and with the returned URL we ask again DBpedia for the comment box, based on the mapping scheme between Wikipedia URLs and DBpedia URIs, lines 14–20. Notice that when we encounter a redirection list we acquire the first URL of the list which in most cases is the desired entity, and acquire the comment box.
The comment box of the input entity is compared with each comment box of the returned entities from ConceptNet, using the TF-IDF algorithm to extract semantic similarity, line 21. Here we follow a policy which prescribes that the descriptions of two objects which are semantically related will contain common words. We preferred TF-IDF despite its limitations, as it may miss some words only from the difference of one letter, because we did not want to raise the complexity of the framework using pre-trained embedding vectors like Glove
[21], Word2Vec
[26], or FastText
[14], this remains as future work.
In order to define the semantic similarity between the entities, we have devised a new metric that is based on the combination of WordNet paths, TF-IDF scores, and ConceptNet weights Eq. (2). We choose this specific metric because it takes into consideration the smallest WordNet path, the ConceptNet weights, and the TF-IDF scores. TF-IDF and ConceptNet scores have a positive contribution to the semantic similarity of two words. On the other hand, the bigger the path is between two words in WordNet the smaller the semantic similarity is.
$$\begin{aligned} \begin{aligned} Sim(i, v) =&\frac{1}{WNP(i, v)} + TFIDF(i, v) + CNW(i, p, v) \end{aligned} \end{aligned}$$
(2)
In Eq. 2, i is the entity given as input by the user, and v is each one of the different values returned from ConceptNet properties. CNW(i, p, v) is the weight that ConceptNet gives for the triplet (i, p, v), and p stands for the property that connects i and v. TFIDF(i, v) is the score returned by the TF-IDF algorithm when comparing the DBpedia comment boxes of i and v. WNP(i, v) is a two parameter function that returns the least common path between i and v, in the WordNet direct acyclic graph.
In case i and v have at least one common hypernym (ch), then we acquire the smallest path for the two words, whereas in case i and v, do not have a common hypernym (nch), we add their depths. Let \(depth(\cdot )\) be the function that returns the number of steps needed to reach from the root of WordNet to a given entity, then:
$$\begin{aligned} WNP(i, v) = \left\{ \begin{matrix} min_{c \in C}\left\{ depth(i) + depth(v)-2*depth(c)\right\} &{} \text {ch} \\ depth(i) + depth(v) &{}\text {nch} \end{matrix}\right. \end{aligned}$$
(3)
where C is the set of common hypernyms for i and v. WNP \((\cdot ,\cdot )\) will never be zero, as two different entities in a direct acyclic graph will always have at least one step path between them.
The last step of the algorithm sorts the semantic similarity results of the entities with respect to the ConceptNet property, and stores the new information into a hash map, line 24. An example of the returned information is given in Example 7 where the Top-5 entities for each property are displayed, if there exist as many.
Example 7
-
coffee IsA: stimulant, beverage, acquired_taste, liquid.
-
coffee AtLocation: sugar, mug, office, cafe.
-
coffee RelatedTo: cappuccino, iced_coffee, irish_coffee, turkish_coffee, plant.
-
coffee UsedFor: refill.