Implicit Knowledge in Argumentative Texts
In everyday communication as well as in written texts people omit information that seems clear and evident, such that only part of the message needs to be expressed in words . While such information can be easily filled in by the hearer, a computational system typically does not possess the knowledge that is needed to reconstruct the implied information. Especially in argumentative texts it is very common that premises are implied and omitted [4, 22, 43]. These arguments are called enthymemes. Thus, the logic of an argument is in general not fully recoverable from what is explicitly said. Regarding our task of argument explicitation, dealing with enthymemes is one of the core challenges. While explicitation based on Toulmin’s model  or Walton schemes  may be regarded as a tangible aim as long as the problem of implied premises is ignored, we argue that most (informal) natural language arguments are enthymemes, and their explicitation, which includes reconstruction, should not be neglected. The problem of enthymeme reconstruction is arguably an AI-complete problem. Broadly, a system tackling enthymeme reconstruction – called an enthymeme machine  – must be able to answer three questions: (i) is the analyzed argument an enthymeme? (ii) which are the gaps that need to be filled? (iii) which are the missing premises? Approaches for addressing questions (i) and (ii) depend on the chosen argument model (e.g., Walton scheme or Toulmin model). Addressing question (iii) is more challenging, since it can only be achieved with respect to (real-world) knowledge available to the system. Such real-world knowledge can be: (i) encyclopedic (e.g., The dog was the first animal to be domesticated) which is available online through Wikipedia and related structured knowledge bases such as DBpedia, Wikidata, Yago; (ii) ontological (e.g., dogs are animal life) which is available for instance through taxonomies and lexicons such as WordNet, as well as Wikipedia-based knowledge bases; (iii) contextual, such as the purpose of the document, the author, the time, etc., and (iv) common sense knowledge (e.g., dogs usually bark when strangers enter their space) which is much harder to source. While the first two types of real-world knowledge can be accessed with state-of-the-art entity linking tools, the last two types of knowledge are more challenging, and are in general much less researched. The ExpLAIN project focuses in particular on reconstructing the latter – commonsense knowledge – and investigates its role in argumentative texts. Our starting point are lessons learned from human-generated data that reconstructs missing and implied information in argumentative texts.
Learning from Human-Generated Data
In a recent annotation project [2, 4] on argumentative texts, we elicitated high-quality human annotations of implied information in the form of simple natural language sentences. The annotations were performed on pairs of argumentative units from the Microtexts Corpus , a concise and focused argumentation dataset that is annotated with argumentative components and relations such as support, rebuttal or undercut. Annotators were asked to add the information that makes connections between given unit pairs explicit, using short and simple sentences. We designed an annotation process that involves several steps to promote annotator agreement and that allows us to monitor its evolution using textual similarity measures . Fig. 7 shows two examples of such annotations: in the first, the main claim 1‑a is attacked by statement 1‑b; in the second, premise 2‑b supports the main claim 2‑a. In both cases, the knowledge underlying the connection between the main claim and the premise is made explicit in clause c, by insertion of one in the first two sentences in the second example.
To learn more about the nature and characteristics of the inserted information and the overt argumentative texts, we further annotated the data with two specific semantic information types which we found to be characteristic for argumentative texts in two studies [3, 4]: (I) Semantic clause types, from which we adopted the most frequent types in  (states, events, generic sentences, and generalizing sentences), and (II) ConceptNet commonsense knowledge relations [49, 50], which comprise an inventory of 37 relations, some of which are commonly used in other resources like WordNet (e.g., IsA, PartOf) while most others are targeted for capturing commonsense knowledge and as such are particular to ConceptNet (e.g., HasPrerequisite, MotivatedByGoal). Two example annotations from our dataset are given in Fig. 8.
Analysis. An in-depth analysis of our annotated German and English data [2, 4] helped us gain insights into the properties of both argumentative texts and implicit knowledge in terms of structural features and semantic information: We found, e.g., that generic sentences are predominant in inserted sentences, indicating the relevance of generic knowledge for implicit information. Almost all sentences in our data – both Microtexts and inserted information – could be mapped to commonsense knowledge relations, which highlights the fact that knowledge bases such as ConceptNet play an important role in argument analysis and are an important source for retrieving implicit knowledge.
Further correlation analysis revealed a number of properties that can guide future systems for automatic reconstruction of implicit information: we found, e.g., that more inserted sentences are needed when no direct argumentative relation holds between units, and that complex argumentative relations such as undercut require more explications than other relation types.
Correlation analysis further identified dependencies between the structure of an argument and the type of knowledge that connects argument pairs: we found, e.g., that states most often occur between units that are adjacent, while events are frequently used for connecting argumentatively related units. Finally, we revealed the importance of causal explanations (as expressed by the relation causes) as implied knowledge between supporting argument units, along with generics.
These insights can inform our future steps towards knowledge-driven automated argument analysis: Reconstructing implicit information can make the underlying logics of an argument transparent for computational systems and can be useful for assessing the strength of an argument. By exploiting the observed characteristics of the manually added implicit information – characteristic semantic clause types and commonsense knowledge relations in specific argumentative contexts – we can guide the process of reconstructing implicit information in argumentative texts automatically.
We addressed this next step – from manual towards computational reconstruction of implicit knowledge – by developing classifiers for (I) Semantic Clause Type and (II) ConceptNet Knowledge Relation Prediction – two semantic information types which our analysis has shown to be characteristic for (reconstructed) implicit knowledge in argumentative texts.
Classifying Semantic Clause Types
Detecting aspectual properties of clauses in the form of semantic clause types has been shown to depend on a combination of syntactic, semantic and contextual features. We explore the task in a deep-learning framework, where tuned word representations capture lexical, syntactic and semantic features [6, 7]. Given a clause in its context (previous clauses and previously predicted labels), the model predicts its semantic type (i.e., state, event, generic, generalizing sentence). We apply a Gated Recurrent Unit (GRU)-based architecture that is well suited to modeling long sequences.
This initial model jointly models local and contextual information in a unified architecture and is further enhanced with an attention mechanism that encodes which parts of the input contain relevant information and has shown to be beneficial for sentence classification  and sequence modeling . Our model implicitly captures task-relevant features and avoids the need to reproduce explicit linguistic features for other languages, as it can tune pre-trained embeddings to encode semantic information relevant for the task. It is therefore easily transferable to other languages. We performed experiments for both English and German that achieve competitive accuracy (English: 72.04, German: 74.92) compared to knowledge-rich feature-based models .
Classifying Commonsense Knowledge Relations
Motivated by our analysis of the nature of reconstructed implied knowledge in arguments, we developed a model for classifying commonsense knowledge relations as represented in ConceptNet. Here the task is to predict one (or several) commonsense relations from a set of relation types that hold between two given concepts . We took into account the specific properties of ConceptNet knowledge relations, which can make relation classification difficult: a given concept pair can be linked by multiple relation types, and relations can have multi-word arguments of diverse semantic types.
We designed a multi-label classifier (COREC)Footnote 2 which uses RNNs for representing multi-word arguments and individually tuned thresholds for improving model performance, especially for relations with unfavorable properties such as long arguments, relation ambiguity and inner-relation diversity. Our best model achieves F1 scores of 0.68 in an open and 0.71 in a closed world setting. For further improvement of the classifier we restructured the relation space by separating ambiguous relations, and add pre- and postfiltering of concepts to reduce uninformative instances. These modifications improve the classifier performance by \(+9\) pp. to 0.77 F1 score. The analysis of the results in different configurations shows that the design decisions driven by multi-word representations and threshold tuning improves the overall classification performance, and that our model is able to tackle specific properties of ConceptNet.
Commonsense Knowledge Base Completion
Knowledge resources are known to be incomplete, and we expect them to be more effective if they can be (dynamically) enriched, on the fly, to offer extended coverage for novel datasets.
In the ExpLAIN project, we investigate several ways for enriching knowledge bases. We research methods for targeted information extraction that use patterns in the knowledge graphs to form more specific queries for completing relation triples . Furthermore, we analyze link prediction methods for knowledge base completion, including studies on the impact of different ways of performing negative sampling  and novel ways of representing knowledge graphs in a more compact, abstract way by combining nodes and edges . Finally, we use our COREC classifier to predict missing knowledge relations for enriching ConceptNet in a dynamic, task-targeted way: in the argument relation prediction task (Sect. 3), we predict commonsense knowledge relations that are not yet defined in ConceptNet between pairs of concepts appearing in pairs of argumentative units, and insert the dynamically predicted relations in the subgraphs created for knowledge path extraction. This improves our model scores for Student Essays and Debatepedia.
Having shown the effectiveness of on-the-fly knowledge base completion for argumentative relation classification, our next step is to apply COREC to the automatic reconstruction of implicit knowledge in model-based micro-explicitation.
This can be achieved straigthforwardly, by applying COREC to predict knowledge relations between concepts stemming from different argumentative units. While this works well for establishing direct links, it can be computationally prohibitive for multi-hop relation paths. Nevertheless, COMET , a pretrained language model, is able to perform target concept prediction for commonsense knowledge relations, given a source concept and a known relation type, and can thus be applied to predict multi-hop paths between concepts by generating intermediate nodes. Since COMET is trained on a language model, we expect it to host knowledge that is complementary to COREC. We therefore plan to combine them for improved accuracy.