Abstract
Recent years have shown that deep learning models pre-trained on large text corpora using the language model objective can help solve various tasks requiring natural language understanding. However, many commonsense concepts are underrepresented in online resources because they are too obvious for most humans. To solve this problem, we propose the use of affordances – common-sense knowledge that can be injected into models to increase their ability to understand our world. We show that injecting ConceptNet knowledge into BERT-based models leads to an increase in evaluation scores measured on the PIQA dataset.
Keywords
- Commonsense reasoning
- Natural Language Processing
- Deep Learning
- Knowledge Graph
Download conference paper PDF
1 Introduction
Equipping computers with the ability to understand the physical world is an important goal of artificial intelligence [2, 8]. In recent years we moved closer to reaching it thanks to the rise of large pre-trained transformer-based models. These models may be taught using the language model objective, which requires them to learn to predict the next word in a given sequence or guess a masked word in a given text passage. Being trained over large textual corpora, these models learn world-related knowledge that helps them choose the right word.
However, a subset of knowledge called commonsense knowledge is not explicitly stated in texts written by humans. Consider, for instance, a presupposition [18] a trolley is light enough to be capable of being pushed by a person related to a statement somebody pushed a trolley. As some ideas are obvious to us and we expect that everyone is aware of them, we usually do not write about them. This problem is especially manifested regarding commonsense physical knowledge on which we concentrate in this paper. This could be problematic, e.g., when using language models in embodied agents that need to interact in the physical world.
To address this issue, attempts to formalize commonsense knowledge are made. The promising idea is to use that formalized knowledge and inject it into pre-trained language models so that they can understand our world better. In this work, we utilize the notion of affordances, i.e., relationships between agents and the environment denoting actions that are applicable to objects, based on the properties of the objects (e.g., whether an object is edible, or climbable) [8] (Fig. 1). We extract knowledge about affordances from a knowledge graph to enrich the knowledge of popular pre-trained models. This paper’s primary research question is whether injecting commonsense knowledge concerning affordances into pre-trained language models improves physical commonsense reasoning.
2 Related Work
Ilievski et al. [11] attempted to group commonsense knowledge into dimensions to verify which of them exactly impact models and concluded by stating that temporal, goal and desires are important dimensions for the models tested. On the other hand, a set of actions that a given object can make in a given environment are used in visual intelligence in the context of classification and labelling [28]. The authors focused on images, unlike our natural language-oriented work.
Commonsense reasoning in this paper is understood as an ability to make assumptions about ordinary situations humans encounter daily in their life. Among datasets that relate to this concept, there are ones that deal with multiple-choice questions [24, 27]. In this paper, we use PIQA [2], which is a recent dataset focused on physical commonsense. The authors of the dataset prove that current pre-trained models struggle with answering questions collected in PIQA since they cover knowledge that is rarely explicitly described in the text (e.g., one has to choose whether a soup should be eaten using a fork or a spoon).
Some popular approaches to solve tasks requiring commonsense knowledge use GPT [7] or BERT-like models such as BERT [6], RoBERTa [14], ALBERT [12], or DeBERTa [10]. As they all follow the language model training objective, we expect they have some world-related knowledge. Results on PIQA using fine-tuned GPT model [3] achieved 82.8% accuracy. Fine-tuning such a model on another task seems to improve its performance consistently [19]. Recently, however, a DeBERTa-based model took the lead, achieving 83.5% accuracy on the leaderboard. There are also PIQA baselines based on BERT [2], but they score lower than DeBERTa and RoBERTa, which seem to be better when it comes to commonsense and overall performance on the aforementioned datasets, especially with highly optimized training hyperparameters [14]. It also appears that attention heads do capture the commonsense, which is encoded in graphs [5]. Moreover, UNICORN, a universal commonsense reasoning model trained on a new multitask benchmark using T5 (roughly 2 times bigger than BERT), where PIQA is a part, achieved 90.1% accuracy [15].
More specialized solutions include external resources that are used for fine-tuning or enriching the model output, such as graphs with labeled edges as interactions between actors [22] and relations between causes and effects [20]. Evaluations using such resources include inquiring a model for additional information [21] or combining data with graph knowledge in BERT models for classification [17]. There are also works that aim to re-define the distance between words using graphs [16] and generative data augmentation which seems to be a kind of adversarial training [26]. Recently, it was shown that adapter-based knowledge injection into BERT model [13] improves the quality of solutions requiring commonsense knowledge.
3 Affordances
The notion of affordances was introduced by Gibson [9] to describe relations between the environment and its agents (e.g., how humans influence the world). This relationship between the environment and an agent forms the potential for an action (e.g., humans can turn on a computer). Affordances help study perception as the awareness of the possibility to do certain actions related to the agent’s world perception. As possibilities of actions – affordances – they are very natural for humans. This intuitively known knowledge may be underrepresented in internet-based textual corpora, while in some domains, such as robotics [1], one of the key reasoning tasks is inferring the affordances of objects (possible actions that can be accomplished with a given object at hand by a robotic agent).
For our use case, we can introduce several restrictions that may help to identify affordances: (i) Affordance must explain some kind of relation between two agents or concepts. This means it needs to touch on the aspect of how those two items coincide with each other or influence each other. (ii)Affordance cannot be a physical connection. Affordance is a metaphysical concept (a possibility of action) that connects two items. Thus, a cable connecting two computers is not an affordance. (iii) Affordance cannot be a synonym. While synonyms are connected by definition, affordance’s goal is to explain how an agent connects to the counterpart in our world, not by just simply stating they mean the same. (iv) Affordance cannot be a relationship based on negation. There are many concepts out in the world that have some sort of relation. However, an affordance must in some way impact or be able to affect one of the agents.
4 Datasets
In this work, we use two datasets – PIQA and ConceptNet. PIQA, or “Physical Interaction - Question Answering”, is a dataset of goals with two possible answers (further referenced to as solutions) provided. Only one of them is correct and choosing which requires some physical commonsense knowledge. For example, asking about how to eat a soup, our model should know that we want to use a spoon instead of a fork. PIQA is divided into train, validation, and test set.
ConceptNet is a knowledge graph proposed to represent the general knowledge involved in understanding language, allowing applications to better understand the meanings behind the words [23]. It is based on data sources such as WordNet, OpenCyc, and Wikipedia. From all possible properties provided in the graph, we chose the ones that match the affordance requirements defined in Sect. 3. These are: CapableOf, UsedFor, Causes, MotivatedByGoal, CausesDesire, CreatedBy, ReceivesAction, HasSubevent, HasFirstSubevent, HasLastSubevent, HasPrerequisite, MadeOf, LocatedNear, and AtLocation.
5 Method
To inject the knowledge extracted from the ConceptNet graph, we need to identify appropriate subjects of the properties listed in Sect. 4 so that the objects related to a given subject via one of the selected properties may serve as an affordance. To achieve this goal, we extract keywords for each question and possible answers from PIQA using the tool YAKE [4]. The keywords found are then linked to ConceptNet. However, if no aforementioned subset of chosen properties is found in the context of a linked entity, we use a definition from the Wiktionary [25] as a fallback. The affordances selected are then passed to a model as part of an input representing a question and an answer pair. The affordance (or a definition from Wiktionary) is tokenized and placed after the last [SEP] marker following the input scheme: [CLS] QuestionTokens [SEP] SolutionTokens [SEP] AffordancesOrDefinitionsTokens. Such an approach is in line with the original experiments with PIQA presented in [2], where similarly each question-solution pair is processed independently in the same manner and the embedding related to [CLS] token representing the whole context is processed by a single feedforward classification layer. We utilize the same approach simply adding affordances to the input so that the [CLS] token is aware of these (Fig. 2). With such a preprocessed input each of the base models is finetunned on the training set and then the results are obtained through the use of the validation set of PIQA. Preprocessing is done before the training begins and therefore it is the same on both sets of data.
6 Evaluation
We grouped affordances into 4 scenarios: (i) standalone aims to collect as many affordances as possible from all considered properties related to extracted keywords. These are then connected as sentences and added to the input as text. (ii) just first, extracts only the first affordance from a given keyword – the one that is the most important for the answer (meaning, we iterate by answer keywords first). (iii) definition adds affordances as well as Wiktionary definitions to the knowledge part of the input, merging both solutions. (iv) complementary aims to add definitions only when we lack any affordances, which is almost 87.4% of cases. This way, the number of separators in the input stays always the same but has either affordances or definitions given in the same place.
As PIQA provides a separate test set, we evaluate our classifiers on this subset using accuracy as a metric, which is a reasonable choice since the dataset is balanced (50% of examples should choose the first solution and the remaining ones the second one). We compared several popular BERT-based models, as they were proved to be good choices in the context of commonsense reasoning tasks. Some of them, like RoBERTa-large, are available on PIQA’s leaderboard for comparison. However, we did not experiment with the top-ranked models like GPT-2 and DeBERTa since they consist of over 1.5B parameters, which makes them hard to fit into GPUs. Thus, we limit our research to popular baselines.
Table 1 provides a summary of accuracy for various models when baseline (no affordances), definition, and affordance scenario is concerned. As there are 4 possible affordances scenarios described above, here we report the scores obtained from the best scenario. Because each model was trained on Wikipedia being part of the training set, we can draw an interesting conclusion: adding definitions from Wiktionary (already seen in the training phase) impairs the overall performance of each model. Conversely, affordances seem to help the overall results on average, especially in cases of bad performance on the baseline, such as the ALBERT model – improving by almost 4%. Unlike the previous method, which seems to worsen the overall results, affordances might be a good way to inform the model about our physical world. In general, we see that injecting affordances is beneficial – in all tested models the accuracy increased.
An in-depth analysis of different types of affordances creation methods is summarized in Table 2. We can observe that the methods based on just the affordances seem to be better – for every model, one of the two methods that only use affordances obtains the highest accuracy. This observation solidifies the hypothesis that language models lack certain knowledge conveyed with affordances.
7 Conclusions
We investigated how language models respond to commonsense physical knowledge and how well they understand the subject. To this end, experiments were conducted to determine how the incorporation of commonsense knowledge into the input of the language model influences the results. This was contrasted with the normal encyclopedic definitions and results without any additional knowledge. To gain commonsense knowledge, this work introduces the concept of affordances to machine learning and answering questions using ConceptNet.
Different types of affordances were also looked at. The paper presents 4 different affordance injection methods with a description and implementation as well as a comparison between them. Surprisingly, they all lead to the same conclusion that the Wikipedia definition knowledge does not help the models to answer the questions – what is more, it usually even makes results worse. Of the methods tested in this paper, only those that rely solely on affordances are of value, namely the one that lists all possible affordances, and the one that lists only one, most important, affordance. These methods turned out to be the most effective in the generated experiments. We published the source code onlineFootnote 1.
References
Beßler, et al.: A formal model of affordances for flexible robotic task execution. In: ECAI 2020, pp. 2425–2432 (2020). https://doi.org/10.3233/FAIA200374, https://ebooks.iospress.nl/doi/10.3233/FAIA200374
Bisk, Y., et al.: PIQA: reasoning about physical commonsense in natural language. In: Proceedings of AAAI, vol. 34, pp. 7432–7439 (2020)
Brown, T.B., et al.: Language models are few-shot learners. arXiv preprint arXiv:2005.14165 (2020)
Campos, R., et al.: YAKE! keyword extraction from single documents using multiple local features. Inf. Sci. 509, 257–289 (2020)
Cui, L., et al.: Does BERT solve commonsense task via commonsense knowledge? arXiv preprint arXiv:2008.03945 (2020)
Devlin, J., et al.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Floridi, L., et al.: GPT-3: its nature, scope, limits, and consequences. Mind. Mach. 30(4), 681–694 (2020)
Forbes, M., et al.: Do neural language representations learn physical commonsense? In: Proceedings of CogSci 2019, pp. 1753–1759. cognitivesciencesociety.org (2019)
Gibson, J.J.: The Theory of Affordances, Hilldale, USA, vol. 1, no. 2, pp. 67–82 (1977)
He, P., et al.: DeBERTa: decoding-enhanced BERT with disentangled attention. arXiv preprint arXiv:2006.03654 (2020)
Ilievski, F., et al.: Dimensions of commonsense knowledge. arXiv preprint arXiv:2101.04640 (2021)
Lan, Z., et al.: Albert: a lite BERT for self-supervised learning of language representations. arXiv preprint arXiv:1909.11942 (2019)
Lauscher, A., et al.: Common sense or world knowledge? Investigating adapter-based knowledge injection into pretrained transformers. CoRR abs/2005.11787 (2020). https://arxiv.org/abs/2005.11787
Liu, Y., et al.: RoBERTa: a robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692 (2019)
Lourie, N., et al.: UNICORN on RAINBOW: a universal commonsense reasoning model on a new multitask benchmark. In: Proceedings of AAAI, pp. 13480–13488. AAAI Press (2021)
Lv, S., et al.: Graph-based reasoning over heterogeneous external knowledge for commonsense question answering. In: Proceedings of AAAI, vol. 34, pp. 8449–8456 (2020)
Ostendorff, M., et al.: Enriching BERT with knowledge graph embeddings for document classification. arXiv preprint arXiv:1909.08402 (2019)
Potoniec, J., et al.: Incorporating presuppositions of competency questions into test-driven development of ontologies. In: Proceedings of SEKE 2021, pp. 437–440 (2021). https://doi.org/10.18293/SEKE2021-165
Rajani, N.F., et al.: Explain yourself! leveraging language models for commonsense reasoning. arXiv preprint arXiv:1906.02361 (2019)
Sap, M., et al.: ATOMIC: an atlas of machine commonsense for if-then reasoning. In: Proceedings of AAAI, vol. 33, pp. 3027–3035 (2019)
Shwartz, V., et al.: Unsupervised commonsense question answering with self-talk. arXiv preprint arXiv:2004.05483 (2020)
Speer, R., et al.: ConceptNet 5.5: an open multilingual graph of general knowledge. In: Proceedings of AAAI, vol. 31 (2017)
Speer, R., et al.: ConceptNet 5.5: an open multilingual graph of general knowledge. In: Proceedings of AAAI, AAAI 2017, pp. 4444–4451. AAAI Press (2017)
Talmor, A., et al.: CommonsenseQA: a question answering challenge targeting commonsense knowledge. arXiv preprint arXiv:1811.00937 (2018)
Wales, J.: The Wikimedia community: Wiktionary (2002). https://www.wiktionary.org/. Accessed 10 Oct 2021
Yang, Y., et al.: G-DAUG: generative data augmentation for commonsense reasoning. arXiv preprint arXiv:2004.11546 (2020)
Zellers, R., et al.: SWAG: a large-scale adversarial dataset for grounded commonsense inference. arXiv preprint arXiv:1808.05326 (2018)
Zhu, Y., Fathi, A., Fei-Fei, L.: Reasoning about object affordances in a knowledge base representation. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8690, pp. 408–424. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10605-2_27
Acknowledgement
This research was partially supported by TAILOR, a project funded by EU Horizon 2020 research and innovation programme under GA No 952215.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
Copyright information
© 2022 The Author(s)
About this paper
Cite this paper
Gretkowski, A., Wiśniewski, D., Ławrynowicz, A. (2022). Should We Afford Affordances? Injecting ConceptNet Knowledge into BERT-Based Models to Improve Commonsense Reasoning Ability. In: Corcho, O., Hollink, L., Kutz, O., Troquard, N., Ekaputra, F.J. (eds) Knowledge Engineering and Knowledge Management. EKAW 2022. Lecture Notes in Computer Science(), vol 13514. Springer, Cham. https://doi.org/10.1007/978-3-031-17105-5_7
Download citation
DOI: https://doi.org/10.1007/978-3-031-17105-5_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-17104-8
Online ISBN: 978-3-031-17105-5
eBook Packages: Computer ScienceComputer Science (R0)