1 Introduction

Public discourse is becoming increasingly dominated by social media, creating a need for robust analysis of arguments raised in computer-mediated communication. The objective of RANT is to explore the possibility of conducting such an analysis in a formally grounded manner despite the high level of both syntactic and semantic variability inherent in the exchanged arguments, based on a large corpus of Twitter messages on the Brexit referendum. In view of the sheer size of the corpus, we follow a high-precision/low-recall approach where we fine-tune corpus queries to extract arguments matching a set of logical patterns (manually developed by inspection of the corpus) and embed the harvested arguments into an expressive logical/argumentation-theoretic framework.

Peculiarities of social media data occur both on a linguistic and on an argumentation-theoretical level.

Linguistic challenges result from the largely unmoderated environment of Twitter, imposing little structure or stylistic guidelines upon users. Tweets often feature multimodality in the form of links, videos or pictures and non-standard language is prevalent. Linguistic phenomena like abbreviations, colloquial style and typographic errors challenge both standard NLP pipelines and traditional frameworks for extracting argumentation. This leads to poor performance on tasks such as detecting argumentative utterances or splitting claims and premises [9, 13].

Argumentation theoretic challenges mostly result from the informal nature of conversations on social media. Arguments in day to day conversations tend to feature a high degree of implicitness because users normally assume common knowledge on the side of the listeners. It is possible for whole premises, conclusions, and any intermediate steps of arguments to remain implicit [5]. This effect is amplified by the fast pace of social media communication and especially Twitter’s limit of 140 charactersFootnote 1. Such incomplete arguments are referred to as enthymemes in the literature [28] and making the missing information explicit is difficult even for human annotators [4].

In this setting of incomplete defeasible arguments, persuasion through rhetoric strategies such as selection, arrangement, or phrasing of argumentative units is highly relevant [27]. Meta argumentation like ad hominem arguments and accusations of fallacies (especially red herring) is ubiquitous. In the particular case of implicit premises, the distinction between fallacies and possibly valid enthymemes becomes blurred – especially for automated analysis.

Formal Argumentationis concerned with representing and reasoning about argumentation in a machine understandable format. We refer to the Handbook of Formal Argumentation [3] for a general introduction, but a brief outline of the relevant aspects will be given in the following.

In RANT, formalisms for representing arguments extracted from our data are particularly important. As social media are a rich source of relevant metadata, formalisms based on the argument interchange format (AIF) like the social AIF (S-AIF) [16], AIF plus (AIF\({}^{+}\)) [21], and inference anchoring theory (IAT) [6] are particularly interesting because they can directly represent dialogue and speakers.

In the AIF and related formalisms, argumentation is represented as a graph with different types of nodes and corresponding meanings. Edges are also modelled as nodes in order to avoid the need for hyper-edges. The AIF was designed to represent monologues and later extended to AIFP where locutions are related through dialogue moves [18, 21] to incorporate dialogical argumentation. The extension to AIFP also brought support for representing IAT, i.e. to incorporate illocutionary force into argument maps for relating locutions and illocutionsFootnote 2 as well as inferences [20]. Furthermore, the SAIF incorporates the authors of locutions into the model for enhanced dialogue analysis and statistics.

We plan to utilise a logic reasoner as part of our argument map construction pipeline for automatically drawing some inferences from illocutions alone. For this aim we incorporate ideas from structured argumentation – such as labelling nodes with logic formulae. Besides other advantages, this will allow us to construct argument maps from a “soup of utterances” on Twitter, where extracting argumentative information from the dialog context is impossible.

Due to the peculiarities of argumentation on social media and especially on Twitter, existing argumentation formalisms require substantial adaptation and extension to be applicable in this context. For instance, most formalisms have no support for meta-argumentation like ad hominem arguments or accusations of fallacies. Especially the former phenomenon is very common on social media, although itself a well-known fallacy, and hence merits consideration in the choice of formalism. Moreover, only very few formalisms have a convenient representation of enthymemes.

2 Data and corpus queries

2.1 Data

Our dataset is based on roughly 23 million tweet IDs containing the string “Brexit” collected between May 5 and August 24, 2016.Footnote 3 We downloaded all available tweets via the official Twitter API (which amounts to roughly 20 million tweets) and processed the dataset as follows.

Firstly, we only consider original tweets (no retweets), which amount to approximately half of all successfully downloaded tweets; the distribution of originals is shown in Fig. 1. Secondly, we only keep tweets identified as English by Twitter, which leaves more than 7 million of the original tweets; the most frequent other languages were Spanish (655,799), French (314,609), German (200,066), Italian (190,530), and undefined (160,756). Thirdly, for the development phase, we restricted the dataset to tweets posted before the referendum on June 23, 2016, aiming at a relatively consistent dataset and expecting substantial differences between arguments presented before and after the referendum. Our final implementation will later be applied to the full dataset, enabling the comparison of argumentation patterns over time.

Fig. 1
figure 1

Number of original English tweets in our data base by day. The dashed bar indicates the day of the referendum (23rd June 2016); only tweets posted before the referendum are included in our corpus

Fig. 2
figure 2

Example reply thread with several linear sub-threads and independent responses

In addition, we resolved reply-threads by retrieving all available tweets for which there is a reply in our dataset, but excluding non-English tweets. In this way, we can access dialogues between users, which are more likely to contain arguments. One example of such a reply thread is shown in Fig. 2. Note that our final database therefore contains tweets sent before May 5, 2016 and tweets that do not contain the search string “Brexit”. There are 215,744 reply-threads involving 688,905 tweets in our data set; 85% of them (183,188) could be resolved to their root.

Last but not least, we excluded near-duplicates (most of them likely generated by social bots – see [25] for details on our deduplication heuristics) as long as they are not part of a reply-thread. This way, we only consider genuine original content. Our final corpus consists of approximately 2.4 million tweets.

2.2 Linguistic Annotation

We used off-the-shelf software tools for tokenization and coarse POS tagging [19]Footnote 4 and a custom lemmatizer based on work by Minnen et al. [17]. POS taggers categorize words according to their parts of speech, e.g. verb or noun. Lemmatizers group together inflected forms of the same stem (e.g. takes, took, taken are all mapped to the lemma take). We additionally ran a tool for phrase chunking and named entity recognition (NER) [22, 23] – which also tags tokens with around 50 fine-grained POS tags following Penn Treebank style – combining the different tokenization layers in a post-processing step. Having a linguistically enriched data basis is an important prerequisite for formulating precise queries, cf. Sec. 2.3.

The annotated corpus has a total size of 32 million tokens. Unsurprisingly, the most frequent words forms are Brexit and the corresponding hashtag #Brexit, which together make up around 4% of all tokens, followed by function words such as determiners, punctuation marks, and prepositions. The most frequent content lemmas (after brexit and the auxiliary be) are vote and eu (both about 0.8% relative frequency).Footnote 5 The coarse-grained POS system tagged approximately one third of all tokens as verb or noun, followed by proper nouns, prepositions, punctuation marks, determiners, adjectives, hashtags, URLs, pronouns, and adverbs. The NER system detected around 10 million noun phrases, 4 million verb phrases, 3 million preprositional phrases, and 2 million named entities. The annotation of a typical tweet is shown in Table 1; this tweet containts a match of the query presented in Sec. 4.Footnote 6

Table 1 Linguistic annotation of a tweet sent on June 17, 2016 (ID: 743726246568173568), in vertical format. Tokenized text can be read from top to bottom in the first column. The following two columns represent the fine-grained and the coarse-grained POS tags, respectively. Lemmas are displayed in the last column. Additionally, phrase chunking and named entity/event recognition is displayed on the right-hand side; the last verb phrase was recognised as an event vote. The query match is indicated by a vertical bar on the left-hand side, anchor points and respective regions next to it, cf. Sec. 4

2.3 Query Architecture

Our corpus queries serve to extract argumentation from the data (cf. Sec. 4). Having queries as the central element of extraction allows us to combine lexical and grammatical patterns with word lists. At the same time, the formulation of explicit queries incorporating a fixed linguistic structure allows us to handle the noisy data prevalent in social media, as the queries can capture typical phenomena on the level of syntax, vocabulary and phraseology.

Our query architecture builds on the IMS Corpus Workbench (CWB) [10], a system designed for enabling complex linguistic searches on large corpora. The query language is based on regular expressions and allows for the incorporation of various levels of annotation. All grammatical information added to the corpus during pre-processing (cf. Sec. 2.2) can be accessed for each individual word or region – for instance, [pos=“N”] will retrieve any word identified as a noun by the POS tagger, while [lemma=“have”] finds all forms of have (have, has, had, having). Similarly, phrase chunks like <np>…</np> specify a sequence tagged as a noun phrase. These elements can be freely combined: <np> [pos=“N”]+ </np> matches a noun phrase consisting only of one or more nouns. For initial query development, we used the web-based concordancing front-end CQPweb [15], allowing us to browse query results, view and sort context displays and perform statistical analyses.

To support the particular needs of RANT, we implemented our own python-wrapper around CWBFootnote 7 and developed a bespoke web-application to manage word lists and queries, and to display results in a way taylored to the needs of argument extraction. In comparison to CQPweb, our app places its central focus on enabling the management of multiple query patterns rather than on individual queries and their statistical properties. This is achieved in particular by supporting complex macros and allowing the user to build and semi-automatically expand word lists.

3 Logical Models

3.1 Argument Representation

As mentioned in Sec. 1, argument representation formalisms need to be substantially amended to meet the challenges of social media argumentation. This led us to develop the trichotomic argumentation framework (T‑AF) [14].

Fig. 3
figure 3

Exemplary Twitter conversation on the Brexit referendum. AfC Argument from Commitment, AfE Argument from Evidence, AfS Argument from Source. Arguments are represented as modal formulae in suitable modal logics, in the present example featuring modalities \(\mathbf{F}^{-1}\) at some point in the past, \(\mathbf{G}\) always in the future, \(\mathbf{K}_{a}\) agent \(a\) knows that, and \(\mathbf{C}_{a}\) agent \(a\) can

Part of a TAF graph (as depicted in Fig. 3) is essentially (up to the graphical presentation) an IAT/AIFP representation of a conversation, featuring hyper-edges labelled with inference schemes between nodes labelled with formulae in modal logic. We left out a further part containing the speakers and their relations, as this is currently not in the focus of our argument extraction pipeline.

These deviations mitigate the following problems posed by the challenges of social media argumentation mining and representation:

Enthymemes

are incorporated by representing all relevant illocutions as logic formulae and having inferences between those when it is clear from context even though it might not be a full argument. This relies on the assumption that active participants of the discussion either understood the inference or would have asked for clarification, thereby leveraging the human inference capabilities lacking in an argument mining pipeline.

Argument schemes

are annotated where recoverable as identifying such schemes can be beneficial when trying to recover the further structure of arguments [11].

Meta-arguments

can be represented via the flexible inference edges that can attack entities (ad hominem arguments) and other relations (relevance attacks) besides illocutions.

Uncertainty

is incorporated at multiple locations in our formalism. At the locution interpretation edges, we incorporate the case where locutions could be interpreted in different ways by linking to multiple alternative illocutions. At the illocutions, we can use similarity-based reasoning to reason about inferences in the presence of bad spelling, abbreviations, or different references to the same entities. Alternatively one could use suitable modalities for directly incorporating uncertainty in the formulae.

3.2 Coalgebraic Logic

As apparent in Fig. 3, the representation of real-world arguments involves a wide variety of modal operators. Recall that modal logics (including most current description logics [2]) are traditionally equipped with a relational semantics. This setup does fit some of the modal operators involved, such as the temporal operators featuring in Fig. 3. On the other hand, the semantics of modalities of knowledge and ability often lives outside the relational world, and involves, e.g., neighbourhood systems or game structures (e.g. [1, 26]). Moreover, relational semantics certainly does not suffice for uncertainty, vagueness, or defaults, which instead require probabilistic, fuzzy, or preferential structure in the semantics.

Coalgebraic logic has emerged as a common framework for logics featuring such extended modalities [7]. It is based on casting semantic models as coalgebras \(X\to FX\) for a given functor \(F\) in the paradigm of universal coalgebra, with the basic example of such a functor being the powerset construction \(F=\mathcal{P}\) on sets, whose coalgebras reproduce the base case of relational systems. Further standard examples of coalgebras include probabilistic systems, preferential structures, and game structures. Coalgebraic logic thus supports a wide range of non-standard modalities such as probably/with probability at least \(p\), if – then normally, or \(X\) can enforce. Modularity results in coalgebraic logic [24] allow for unrestricted combination of such modalities in the sense of a fusion of modal logics, in formulae such as \(\mathsf{ParliamentaryDemocracy}\Rightarrow[\mathsf{Parliament}]\,\mathsf{NewExecutive}\) ‘in a parliamentary democracy, parliament can normally force a change of executive’ (unless exceptional situations occur such as an irregular suspension of parliament). The above-mentioned logical patterns are thus essentially formulae in coalgebraic logic with placeholders for subformulae. Reasoning support is provided, either off-the-shelf or by easy implementation of further instance logics, via the generic Coalgebraic Ontology Logic (COOL) reasoner [12]. An important desideratum for further research is to provide support for similarity reasoning (e.g. [8]) at this level of generality, to ameliorate problems caused by deviations in vocabulary and phrasing.

4 Argument Extraction

Our current inventory consists of 25 formal patterns representing logical constituents in everyday argumentation, such as position to know arguments:

$$({?0:entity})\Rightarrow\downarrow x.\,K_{x}({?1:entity})$$

Roughly speaking, the above example pattern states that ‘\(?0\) are in a position to know \(?1\)’, often phrased in the form ‘As a \(?0\), I know \(?1\)’ (for brevity, we elide the implied claim ‘I am a \(?0\)’.) Formally, the \(\Rightarrow\) connective is a default conditional ‘if – then normally’; the standard \(\downarrow\)-binder of hybrid logic binds the name \(x\) to the current individual (‘I’); and \(K_{x}\) is an epistemic modality, read ‘\(x\) knows that’. The latter form of epistemic modality poses new challenges for logical reasoning, being indexed by a variable individual \(x\). The example match (Table 1) to the query described next thus yields, slightly abbreviated, the complete formula \(\mathsf{scientist}\Rightarrow\downarrow x.\,K_{x}\,\mathsf{value\_of\_collaboration}\). Because our logical formulae are more abstract than their linguistic representations, we have more queries than patterns, with a current set of 67 corpus queries. Queries are designed to capture as many instances of a given pattern in a particular linguistic context as possible, while maximising precision. One of the queries associated with the pattern above, formulated in the syntax described in Sec. 2.3, is

figure b

For each query, automatically annotated elements on the grammatical level are combined with custom-designed re-usable macros and word lists. In the example, the entity claming expertise is designated by a noun phrase (<np>) containing an element from one of four word lists describing persons. For instance, $profession includes entries like scientist, historian, economist while $common_people are generic person terms (person, dude, gal). pos_ner references the fine-grained part-of-speech annotation, allowing us to filter out modal verbs and infinitive markers in particular (MD, TO) while the tags assigned by pos_ark capture coarser categories (N = noun, Z = name, O = pronoun, …). The macro /be_ap[] captures a form of be followed by an adjective phrase (am sure, is certain). Elements beginning with an @ followed by a number and a colon are target markers, which allow us to extract individual words or ranges (i.e. the words between two target markers), corresponding to a particular slot in the logical formula. In our example, entity \(?0\) is expressed by the region from @0: to @1: while \(?1\) is expressed by the region from @2: to @3:.

Evaluation In a preliminary evaluation study, we have manually checked the matches of a different query (associated with a concept similarity pattern) in order to assess our deduplication algorithm and to determine initial precision results. Automatic near-duplicate detection identified 33 duplicates among 99 matches; there were no false positives, but 12 false negatives. The large number of duplicate tweets skews the precision results to some extent: In the raw corpus, 17 out of the 99 matches were false positives (83% precision); in the manually deduplicated corpus, precision decreases to 72% (15 false positives out of 54 unique matches).

5 Conclusions

RANT follows a corpus-linguistic approach to extract argument patterns from a large dataset and transform matches and associated argumentation structure into a combined logical and argumentation-theoretic formalism whose development constitutes part of the project work. Ongoing work aims to extend the logical and structural expressiveness of the framework and to increase coverage and precision of the queries. An additional issue to be addressed is developing suitable evaluation measures; in particular, measuring recall remains challenging.