Deciphering Personal Argument Styles – A Comprehensive Approach to Analyzing Linguistic Properties of Argument Preferences

Zymla, Mark-Matthias; Buchmüller, Raphael; Butt, Miriam; Keim, Daniel

doi:10.1007/978-3-031-63536-6_18

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14638))

Included in the following conference series:

Conference on Advances in Robust Argumentation Machines

Abstract

In this paper, we introduce an application for exploring the effect of linguistic features on personalized argument preferences. These individual preferences are derived by measuring the impact of linguistic features on pairwise comparisons between arguments. The insights derived from this are, in turn, useful for studies of argument quality. To conduct this research, we have developed a new pipeline that covers three major components: data collection, argument comparison labeling, and data exploration, incorporating linguistic annotations of arguments and preference data. The first component has resulted in a novel corpus consisting of minimal pairs of arguments: the comparable argument corpus. For the second component, we have developed a visual interactive labeling system that structures the annotation process of pairwise comparisons. Through these annotations, we extract patterns of argument preferences using Gaussian Process Preference Learning based on linguistic feature vectors. The corresponding, personalized models are used to identify relevant features to explain argument preferences. By training individual models for different users, we gain information that allows us to compare different user groups, identifying different argumentation preferences across groups. Each of these steps is supported by novel visual analytics dashboards, facilitating data collection and annotation steps and enabling the exploration of personal preferences.

This work has been funded by the Deutsche Forschungsgemeinschaft (DFG) within the project CUEPAQ, Grant Number 455910360, as part of the Priority Program “Robust Argumentation Machines (RATIO)” (SPP-1999). We are grateful to the participants of the Priority program RATIO for discussions at various project meetings. Furthermore, we would like to thank Mennatallah El-Assady, Annette Hautli-Janisz, Chris Reed, and Rita Sevastjanova for providing crucial feedback We are also very thankful to Fabian Sperrle for his contributions to the project.

You have full access to this open access chapter, Download conference paper PDF

Keywords

1 Introduction

This paper addresses a central question within research on argumentation, namely: What makes a good argument? [29, 41, 43,44,45]. The literature so far has established, that the quality of an argument has many dimensions, which pertain to the content of the arguments themselves as well as their rhetorical “packaging”. In our project Visual Analytics and Linguistics for Capturing, Understanding, and Explaining Personalized Argument Quality (CUEPAQ), we have built on our expertise in the linguistic analysis of argumentation [11, 14, 38] to explore the hypothesis that argument preferences are, in fact, often more subjective than the current state of the art in the literature leads us to believe (cf. [41]). More concretely, the project focuses on the effect of linguistic features on personalized argument preferences.

For this, we have developed a new application, the CUEPipe. This pipeline allows researchers to generate data sets for assessing personalized argument preferences as well as annotating these data sets for argument preference. Expecting different results from different annotators, we also provide a platform for exploring personalized argument models learned from the annotations. Thus, the CUEPipe allows linguists to investigate argument preferences, including our claim that argument preferences are, to some extent, subjective. In this paper, we describe three major components of the application:

i.
An interface for generating a corpus of arguments and exploring its linguistic feature diversity
ii.
An interface for labeling pairwise comparisons between arguments
iii.
An interface for exploring personal argument preferences

We illustrate each of these steps based on a proof-of-concept use case by reporting our own experiences with the application and the results of a user study tailored towards testing the visual interactive labeling aspect of the application and the exploration of personal argument preferences. For this, our declared goal was to explore whether how we attribute beliefs to different entities affects how we perceive the corresponding arguments. We do this by looking at how propositional attitude verbs affect argument preferences.

The paper is structured as follows: In the next section, we describe the concepts explored in CUEPAQ in more detail. In Sect. 3, we describe how these concepts relate to the CUEPipe and in Sect. 4, we describe our pilot use case involving user studies. Section 6 concludes.

2 Background

One main goal of our research is to investigate the impact of linguistic features on argument preferences in a controlled manner. To achieve this, we drastically simplify the complexity often attributed to the structure of arguments, as becomes apparent when investigating the topic of argumentation schemes (e.g., [23, 31, 46, 47]). As such, we rely on the simple idea that “Argumentation is aimed at increasing or (decreasing) the acceptability of a controversial standpoint” [42, p. 4].^{Footnote 1} In the next section, we motivate this decision.

2.1 Argument Data

We treat arguments as tuples (premise,conclusion,relation), following basic (computational) argumentation schemes [30]. The premise and the conclusion are unmodified linguistic expressions.^{Footnote 2} They stand in a specified relation to the argument and are taken to be a member of the set {support,attack}. The support relation indicates that the premise increases the acceptability of the conclusion, while the attack relation aims at decreasing the acceptability of the conclusion. An example is illustrated in (1).

One of the reasons to focus on the simple arguments is to enable the contrastive study of linguistic features by means of minimal pairs. Minimal pairs originated in the linguistic study of sounds and are used to help determine distinctive classes, for example, to determine the phonemes of a language. Beyond phonology, the concept has been applied to different kinds of minimal pairs, prominently syntactic minimal pairs, which have been used, for example, in language acquisition research [10, 20]. Similarly, minimal pair data has been used to judge the linguistic ability of machine and deep learning systems (see, e.g., [27, 48]). Our goal is to investigate whether minimal changes affect the judgment of argument preferences. More concretely, our corpus helps to explore how minimal changes in the premise affect the acceptability of the conclusion of an argument. A typical minimal pair in our corpus is exemplified by (1) vs. (2). As can be seen, our minimal pairs are based on the choice of lexical items that make up an argument. The term minimal refers to the addition, removal, or change of at most one word.

As discussed in Sects. 3 and 4, these minimal pairs help provide balanced corpora for research into individual preferences as tied to linguistic features.

2.2 Argument Preferences

According to recent work on the assessment of argument quality, argument preferences are affected by various dimensions [43, 44]. However, these dimensions have mainly been used to assess objective argument quality. [41] acknowledge the subjectivity of rating argument quality, but do not explore this further. Our approach is based on the efforts to evaluate how convincing arguments are by [16, 36, 37] and similar approaches, and we treat argument preference as a single value derived from a function over linguistic features values of that argument.

We collect pairwise comparisons of arguments to train models that learn this function (e.g., [16]). More concretely, our approach is based on [36] (also [16, 43]). This means we train argument preference models based on Gaussian Process Preference Learning (GPPL). We chose this model since it is particularly well suited to working with sparse data. Furthermore, [37] opens up new possibilities for future research by simultaneously including annotations from multiple users. As we are primarily interested in the impact of linguistic features on model performance, we focus on using linguistic features for assessing argument preferences, e.g., [5, 29, 44] to train these models.

Based on the collected argument preferences and the models trained on them, we can develop user profiles that explain the linguistic preferences of users. For this, two strategies are pursued in this paper: i) analyzing the feature importance scores that a model assigns during training, and ii) analyzing the most and least preferred arguments of a user using register analysis methods [4].

2.3 Visual Analytics for Linguistics

We integrate diverse methodologies from the domain of Visual Analytics [24] to support argument and model exploration as well as user engagement in the procedural stages of the CUEPipe. We draw upon expertise from prior studies in the field of natural language exploration [5, 11]. Specifically, we derive methodologies from the field of visual data collection [12, 25, 33] to support the process of corpus annotation. We further integrate a new visual interactive labeling component derived from [2, 3, 34] for annotating argument preferences. Finally, we introduce a dashboard designed for the examination of preference models by introducing a new radial evaluation technique based on former approaches to user-centric visualization [8, 18, 34, 35], thus adding to the growing body of work on LingVis: Visual Analytics for Linguistics [1, 7].

3 The CUEPAQ Argument Exploration Pipeline

Our CUEPipe is a web-based application providing graphical user interfaces for various tasks related to the linguistic modeling of argument preferences. In this section, after introducing the overall workflow, we present the individual components, describing their basic functionalities and intended applications.

3.1 The CUEPipe Workflow

Figure 1 shows the overall workflow of the system. CUEPipe provides various interfaces (I) for working on collecting argument data. The (V)isualizations described in Sect. 3.2 provide an intuitive overview of the data set, allowing for its exploration. As described in Sects. 3.3 and 3.4, the labeling process and exploration of argument preferences are also supported by separate interfaces and visualizations.

Furthermore, the workflow in Fig. 1 highlights the different roles of entities interacting with the CUEPipe. It provides access to an extendable argument corpus. However, it is best used to study specific linguistic cues in a targeted data set. Thus, the first important role is that of the linguist. The linguist formulates a hypothesis and defines an expected outcome of the study. Then they generate a data set accordingly. Correspondingly, they may choose to specify a feature set that focuses on the attributes of interest.

The next step is conducting the study. The second role, users, consists of the target group. Here the subjective nature of argument preferences comes into play. The user group of a study can be categorized across different dimensions, e.g., demographic features, such as age, gender, or income. This depends on the goal of the study and the corresponding hypothesis. The task of the user group is to compare arguments pairwisely to create a model that captures their argument preferences reasonably well, as described in Sect. 3.3.

Finally, the role of the analyst is to interpret the resulting preference models and the insights they provide on the user group, e.g., finding clusters. The analyst has a dual role, as it should inform both the linguist and the users. Concerning the users, the goal is to teach them about their argument preferences by analyzing the features that play a role in their preference models and comparison with other models. With respect to the linguist, the analysis needs to communicate the actual outcome of the study, involving information about model performance and other factors that might affect the reliability of the study. This forms a feedback loop. Depending on the study’s outcome, the linguist may want to revise their hypothesis or tweak other variables, such as the used feature space or the argument set. If the result confirms the hypothesis, the linguist still needs to evaluate the created models carefully to ascertain that the results are reliable.

The best use for the CUEPipe may be for prototyping studies to make sure that a more detailed investigation is warranted. However, it also allows linguists to expand on a study incrementally. In principle, the different elements are modular, allowing for individual use, too.

Table 1. Argument distribution in the CAP

Full size table

In the next few sections, we will present the individual steps in Fig. 1, collection, labeling, and Analysis, in more detail focusing on their implementation.

3.2 Generating a Data Set for Exploring Argument Preferences

The CUEPipe provides a graphical user interface for adding arguments to the Comparable Argument Corpus (CAP) we have developed. The main innovation of the CAP is that it allows adding minimal variations of arguments that contain contrasting lexical items. Thus, the interface is designed to provide a view for adding arguments, a view for varying arguments, and a general argument view that groups arguments and their variations to provide a high-level overview.

Data Collection: The corpus is divided into three levels, new arguments, staging arguments, and corpus. This distinction is mainly for quality control reasons. Arguments, as well as their variations, must adhere to the general structure described in Sect. 2.1: (premise, conclusion, relation), arguments must be linguistically adequate (i.e., no non-sense strings, etc.), and the relation between premise and conclusion must be conceivable (thus, all arguments are assumed to surpass a certain argument quality threshold). After submission to new arguments, two additional data collectors have to confirm these requirements by promoting arguments to staging and corpus, respectively. Consequently, three distinct experts confirm each argument to be suitable for the corpus.^{Footnote 3} Table 1 describes the current size of the corpus. Variation ratio refers to the average number of variations per argument. Unique standpoints refers to the number of unique conclusions, indicating topic variation in the corpus. Since the goal is to focus on linguistic feature effects on argument preferences, we aim to provide a varied data set that allows the creation of test sets for various topics.

Each argument is annotated with metadata, including relations between arguments (i.e., whether an argument is a variation of another or an original argument), the author label for variations, and the source label for original arguments.^{Footnote 4} For the sake of keeping the structure simple, there is no nesting of variations in the corpus, so variations generally have 0 other variations (although they may be incidental variations of other variations of the original).

Linguistic Feature Annotations: Each argument in the main corpus is annotated with linguistic features to allow for the exploration of personalized argument preferences. For this, we use several automated feature annotation pipelines. Some of these were borrowed by other work, e.g., [11, 39] and [29], while some features have been implemented actively for the CUEPipe. Particularly relevant for the CUEPipe are features introduced by lexical items, including the concrete use of certain items and additional properties. Examples of this are embedding verbs, noun and verb modifiers, and different types of negation (verb vs. noun). As an example of additional annotations related to the concrete lexical items, we use the semantic parser by [21] to distinguish different kinds of intensionality (veridical, averidical, and anti-veridical). Overall, the application supports 66 linguistic features, ranging from stylistic to semantic. These features are organized into feature groups that give an intuitive understanding of their expected role in analyzing personal argument preferences.

Corpus Exploration: In addition to the corpus management functionality, we provide a visual exploration dashboard to interact with the data in the corpus. This component primarily serves to inspect feature distributions and interactions in the corpus. It consists of three parts: the argument similarity map and the global and local co-occurrence matrices.

The argument similarity map, as the name suggests, maps arguments onto a two-dimensional space as circles. It distributes them according to their similarity based on their annotated features. For this, we use an off-the-shelf dimensionality reduction (principal component analysis, PCA; [17]) to reduce the linguistic feature vectors to two dimensions. The map can be customized for selected feature combinations. Thus, distributions of different feature categories based on the analyst’s interest can be evaluated in this way. Moreover, different feature sets, or individual features, can be mapped onto the x- and y-axes of the map. As shown in Fig. 2, the selected features for each axis are reduced to one dimension each. This allows linguists to compare the distribution of features or feature groups in relation to the overall complexity of the corpus. Figure 2 illustrates this by presenting the distribution of the feature averidical-ratio relative to the full feature set. As the picture suggests, many arguments do not indicate averidicality in the selected argument set. However, of those marked with averidicality, we can see that they are somewhat distributed across the data set.^{Footnote 5} Linguists can select arguments of interest, such as argument clusters, outliers, or arguments of a certain value, for closer inspection to refine the information provided by the argument similarity map. As shown in Fig. 2 on the right, researchers then see global and local feature co-occurrence matrices. As the name suggests, these visualizations present feature collocations. The global matrix displays pairwise interactions within the selected subspace in the upper right corner. Darker shades indicate a high number of feature co-occurrences, while brighter shades indicate fewer feature co-occurrences. When a cell is selected, the local matrix (bottom right corner) shows how two features interact in close detail using the same overall method. Thus, the local co-occurrence matrix in Fig. 2 suggests that, in this selection, many arguments contain one propositional attitude verb expressing a level of veridicality.

Overall, the argument exploration dashboard can be used to find balanced data sets for specific features and to explore and reduce imbalances in test sets. Furthermore, it provides an overview of the coverage of the corpus.

3.3 Learning Preferences via Visual Interactive Labeling

Our goal is to learn preferences from pairwise comparisons, as illustrated in Fig. 3. There, two different arguments are presented. In accordance with our definition of an argument, choosing the preferred argument involves choosing the argument for which the premise better affects the conclusion (i.e., increases or lowers the acceptability of the conclusion). This task can be varied across various dimensions, e.g., by only presenting premises affecting the same conclusion, or only arguments with support relations, etc. Thus, the system allows for some flexibility concerning the definition of comparison tasks.

The annotation of argument preferences is an extremely expensive task due to the fact that the number of comparisons n in a set of arguments with size x exhibits quadratic growth (n = ( \((x * (x-1))/2\)). Thus, a full annotation of 30 arguments already requires 435 comparisons. Because we want to test personalized argument preferences, we cannot use multiple annotators for the same model to reduce the annotation cost per annotator. Consequently, we have developed a system that is aimed at supporting this costly annotation process and possibly reducing the number of annotations needed to make valid predictions about a user’s argument preferences.

Learning Argument Preferences: For learning preferences, we represent arguments as linguistic feature vectors based on the annotations explained in Sect. 3.2. As an underlying model, we use a model for pairwise preference learning based on Gaussian Process Preference Learning [36, 37], a type of Bayesian inference model. These models define a real-valued function f that takes linguistic feature values as input and can be used to predict rankings, pairwise labels, and ratings for individual arguments [36]. Concretely, ratings are represented as numeric values provided by f, where higher values correspond to a stronger preference for the given argument based on its features. Pairwise labels are predicted via the preference likelihood \(p(i \succ j|f(x_i), f(x_j))\), where \(i \succ j\) is a pairwise label comparing two arguments (i.e., argument i is better than argument j).

The application does not hinge on this choice of model. However, preliminary tests have shown the model’s suitability for testing the overall pipeline. We primarily relied on its good performance on sparse data, allowing it to learn from relatively few comparisons, making it more feasible to learn the preferences of individual users.

Visual Interactive Labeling: The visual interactive labeling process is divided into two parts. First, a small random subset of comparisons is sampled from the data set that is to be annotated. A user annotates this subset to provide some initial comparisons for model training. Once the subset is fully annotated, the second stage begins.

In the second stage, the user is supported by information from their preference model. Figure 3 illustrates our interface for visualizing model information. On the left-hand side, the two arguments are presented side-by-side. They are compared on a 5-point scale corresponding to the position of the arguments (i.e., A1 is the left argument, and A2 is the right argument). The visualization on the right side guides users through the annotation process. It can be divided into two parts divided by the arguments (represented by their IDs) as the spine of the visualization. On the left side, an arc diagram provides information about the overall annotation progress by visualizing the already annotated argument pairs in gray. Additionally, the arc diagram visualizes predictions by the user’s model: the five green arcs suggest candidates for the next comparison based on the model’s variance predicted for these comparisons. These suggestions are calculated globally across all arguments by default. The current arguments displayed for comparison are highlighted in pink as the comparison most favored by the model. The user can change the next pair of arguments by selecting other green arcs or clicking on single arguments. This action can become relevant when including argument-specific information in the decision process. As displayed on the right side, each argument is represented as a tuple of bar charts describing its number of annotated comparisons in orange, its assessment of the certainty of this score in blue, and its predicted absolute preference score in red. Sometimes, relying on the variance alone leads to a situation where only a certain subset of arguments is frequently annotated while other arguments are not annotated at all. Users may wish to strive for a more balanced annotation process. The visualization gives them the flexibility to do this. The visualization also allows users to investigate their annotation process by showing them the predicted ranking of the arguments based on their model. Thus, in addition to the concrete display of the accuracy value of the model, users can also confirm that the model learns the appropriate expected rankings for individual arguments.

Ultimately, this visualization serves to investigate strategies to quickly increase model accuracy, particularly during the annotation of large data sets. Once a large number of arguments is involved, it becomes unfeasible to annotate them all. Thus, doing the right annotations to increase model predictability is essential. As of now, we rely only on data from the trained models; However, the issue has been gaining more attention recently (e.g., [13]). Thus, future work aims at improving the model’s capability to select meaningful comparisons.

3.4 Exploring Personal Preferences

The CUEPipe provides functionality for exploring preference models based on the previous steps of the pipeline. Concretely, we provide functionality for model performance analysis and model comparison across different users.

Model Performance Analysis: The application allows users to apply models to arbitrary data sets, allowing users to test them on unseen data. For this, k-fold cross-validation is provided as well (for k = 5). This allows users to train the model on larger data sets involving both seen and unseen data, providing a more in-depth understanding of the performance of a user’s model. Additionally, we have added functionality for re-calculating the model training history. This allows us to investigate the model’s performance in relation to the annotation progress of a given data set.

Comparative Model Exploration: The main visualization is presented in Fig. 4. It allows for the exploration and comparison of user models according to their predicted preferences. Again, we make use of a principal component analysis to project high-dimensional feature importance vectors provided by user models on a two-dimensional, radial space. Hence, models that are displayed close together share similar feature importance vectors. We use this metric as an indicator of the impact of linguistic features on the prediction of argument preferences. Thus, different feature importance values indicate different argument preferences.

To illustrate these differences, we separate the space into multiple slices by displaying a visualization similar to a pie chart. The user may determine the number of slices. The individual pieces of the pie describe potential model clusters, i.e., models with similar feature importance patterns. The feature importance vectors of these models are aggregated and visualized in the outer ring of the visualization. This provides users with information on differences between the various clusters. Color is used to affiliate the user model to the respective arc and to display important feature differences in the outer ring, thus supporting the differentiation between the different model clusters.

The model comparison visualization allows an analyst to cluster users and find commonalities between their models. To further explore the models, it is possible to extract top and bottom arguments from the annotated data sets (and beyond, if model performance allows it) and feed into the previously presented argument exploration view (Sect. 3.2). There, feature distributions in the different sets (all, top, and bottom arguments) may be inspected.

4 Study: Propositional Attitudes

We conducted a proof-of-concept case study to evaluate the functionality of the system. The study consists of a linguist creating a data set for exploring the impact of propositional attitude verbs on argument preferences. Subsequently, users were asked to compare arguments to learn their preferences. Finally, the results were analyzed using the presented model exploration functionality.

Creating Data Sets: For this proof-of-concept study, we created a data set consisting of arguments containing propositional attitude verbs, a kind of embedding verb encoding the commitment of a source to the embedded content.

More concretely, the properties we are interested in relate to the intensional nature of (some) embedding verbs [9, 26]. One important notion is factuality or factivity [22], which also receives regular attention in the computational literature (e.g., [32, 40]).

For the sake of this paper, we understand factivity as a continuous value that describes the degree of commitment attributed to the content embedded under factivity markers, e.g., discover in (3-a) or believe in (3-b). However, as (3-b) illustrates, the source that the commitment is attributed to is also relevant. Thus, the fact that 3 out of 50 lawyers believe that piracy is theft is not a strong premise for the standpoint piracy is theft (cf. (3-b)); however, were it 47 out 50 lawyers, then despite the less strong commitment indicated by believe (compared to discover), the premise might still make a good support.

The test and training data sets were created by a linguist based on 88 arguments containing propositional attitude verbs in the corpus. The data set was skewed towards embedding verbs claim, think, agree, and show. This is illustrated in Fig. 5 (right side). The embedding verbs used in the 15 test arguments are shown on the left of Fig. 5.

Preference Learning Experiments: We tested five participants (i.e., users) for this study. All of them had an academic background (three student assistants and two postdocs). Each participant did two annotation sessions. In the first session, they annotated preferences based on ten arguments, resulting in 45 random comparisons. These later serve as test sets for the models trained on their training sets comprising 105 comparisons. Due to the sparseness of the data, we tested the model both on seen and unseen data. The results in Table 2 show that while the model learned argument preferences for some of the users relatively well within the seen data, applying the models to unseen data shows that they have not learned enough to make general predictions about the argument preferences of the users (tested by combining the training and test sets and applying k-fold validation with k=5, as provided by the application).

Table 2. Model performance (accuracy) across users

Full size table

Model Exploration: We fed the four models based on the annotation study into the model exploration dashboard. The dashboard shows that the three users with relatively high accuracy on the seen data form a cluster with respect to the feature importance values of their models. User1 and User3 formed their own clusters (see Fig. 4). We can see that the models with the best accuracy metrics generally have higher feature importance scores across features. This suggests that their preference patterns are more consistent with underlying linguistic features. However, comparing the cluster of three in isolation reveals considerable differences in the importance of argument features, suggesting that the models are still quite distinct. Concretely, for User5, positive and negative sentiment were important features for the model, but the semantic features of veridicality and averidicality (i.e., those pertaining to factivity) did not seem to play a role. Conversely, User2 put focus on neutral sentiment, and the semantic features concerned with factivity were among the most important ones of their model. Finally, User5 model was mostly affected by features pertaining to linguistic complexity, while sentiment and factivity features played only a minor role.

Overall, the study’s compactness requires us to take these assumptions with a grain of salt. Nonetheless, this proof of concept shows that the pipeline can be used to inspect personal argument preferences across multiple users within a few arguments. Informally collected feedback from the five users was very positive on average, although smaller technical issues occurred during the study. However, we will leave a more detailed analysis of the system’s usability for future work.

5 Limitations

In this section, we discuss two limitations that pertain to the CUEPipe itself and to limitations of our proof-of-concept study.

5.1 The CUEPipe

Currently, CUEPipe is best suited for smaller pilot studies tailored toward the initial investigations of hypotheses. This limit is imposed on a technical level as well as on the level of implementation of the visualizations. On a technical level, the limit pertains mainly to the visual interactive labeling step. In the current implementation, model updates needed for making meaningful annotation suggestions require complete retraining of the model. This works well on smaller amounts of data but can interrupt the annotation process as the models become larger, both in terms of feature annotations and the number of comparisons. To some extent, this can be solved implementationally by optimizing training procedures (e.g., by running them asynchronously in the background and adapting the interface between the annotation interface and the trained preference models accordingly). Another possibility that could be explored is to create crowd models [37] and, for example, merge models of users with similar annotation behaviors. However, this would obviously lead to larger but (potentially) fewer models. Thus, ultimately, large-scale studies based on pairwise comparisons may require a more powerful infrastructure than is currently available.^{Footnote 6}

Another issue of the CUEPipe is that the visualizations do not always scale optimally with increasing data complexity. In particular, representing complex feature annotations can clutter visualizations. Thus, organizing and representing linguistic features intuitively is an ongoing concern. Our goal is to improve on the current state which only allows the selection and deselection of features. A more ambitious approach would be to incorporate guidance. For example, in the radial exploration visualization, such a system could attempt to automatically detect features relevant to distinguishing target groups and highlight those.

5.2 The Proof-of-concept Study

The study focused on the system’s overall usability, concentrating on the workflow described in Sect. 3.1. As mentioned there, the study should be seen as a prototype study. The main drawbacks are as follows:

The study participants have not been selected with certain demographic properties in mind. Although the system can and should be used to find differences in seemingly homogenous groups (in this case, all participants were academics), a study that is geared towards predicting predefined clusters in a target group may illustrate the validity of the system more clearly.

Overall, the study is small-scale. Thus, as mentioned in Sect. 4, the results should be taken with a grain of salt. This is further compounded by the fact that we rely on automated feature annotations. While this is fine for some features, e.g., those pertaining to language complexity, in particular, the meaning-oriented features, such as veridicality, need to be carefully evaluated to avoid propagating wrong information in the analysis stage of the system. For example, the system broadly captures the right generalizations regarding the relation between attitude verbs and veridicality, but there exist some outliers that can falsify results. Concerning the first problem, future studies are planned with a focus on exploring the individual properties of target groups. Regarding the second problem, including evaluation metrics for features may make the system more transparent.

6 Conclusion

We have presented an application combining three major components for researching personalized argument preferences: data collection, preference labeling, and preference exploration. We also contribute a small (but dynamic) corpus of linguistically annotated arguments and various techniques for visual analysis of linguistic data. The CUEPipe application has been demonstrated by means of a proof-of-concept study, indicating that the overall workflow is successful.

The pipeline offers up multiple avenues for future work, e.g., facilitating comparative annotation, the visual representation of linguistically annotated data, and the visual exploration of linguistic models. Overall, the CUEPipe provides exciting prospects for exploring personalized argument preferences. Its coverage of various major tasks in linguistic research makes it interesting for everyone working on argument preferences. Furthermore, its ease of use reduces the barrier to conducting various tasks for users new to the topic.

Notes

1.
While we restricted ourselves to simple argument types, the methods presented in this paper are applicable to more complex arguments as well. However, creating minimal pairs becomes more complicated the more complex the arguments are.
2.
Some annotation schemes, e.g., Inference Anchoring Theory [6] distinguish between locutions and propositional content, when defining the building blocks of an argument. There, the propositional content is reconstructed from the original locution enriching it semantically, e.g., by resolving anaphora.
3.
Experts are linguistic researchers of argumentation and linguistics students with training in the annotation of arguments.
4.
The arguments in the CAP have been collected from various sources, including existing argument corpora, e.g., the IBM 30k corpus [15] and the argumentative microtext corpus [28]. However, many arguments have also been collected manually with support from the OVA tool [19].
5.
Averidicality refers to elements that are linguistically marked as not factual, e.g., introduced by verbs like believe, think, assume, compared to verbs like know, discover, forget, which presuppose that the content that they mark is veridical. See Sect. 4.
6.
As [36, 37] show, large-scale studies are not generally impossible for GPPL. The concerns here target the visual support component of the annotation task.

References

Beck, C., Butt, M.: Visual analytics for historical linguistics: opportunities and challenges. J. Data Min. Dig. Hum. Special issue Visualisat. Historical Linguist. 1–23 (2020)
Google Scholar
Bernard, J., Zeppelzauer, M., Sedlmair, M., Aigner, W.: A Unified Process for Visual-Interactive Labeling. The Eurographics Association (2017). https://doi.org/10.2312/eurova.20171123, https://diglib.eg.org:443/xmlui/handle/10.2312/eurova20171123. Accessed 12 Jun 2017. T05:16:33Z
Bernard, J., Zeppelzauer, M., Sedlmair, M., Aigner, W.: VIAL: a unified process for visual interactive labeling. Vis. Comput. 34(9), 1189–1207 (2018). https://doi.org/10.1007/s00371-018-1500-3
Article Google Scholar
Biber, D., Conrad, S.: Register, Genre, and Style. Cambridge University Press, Cambridge (2009)
Book Google Scholar
Bögel, T., et al.: Towards Visualizing Linguistic Patterns of Deliberation: a Case Study of the S21 Arbitration (2014). talk presented at DH2014 in Lausanne
Google Scholar
Budzynska, K., Janier, M., Reed, C., Saint-Dizier, P., Stede, M., Yakorska, O.: A model for processing illocutionary structures and argumentation in debates. In: Proceedings of the 9th edition of the Language Resources and Evaluation Conference (LREC) (2014)
Google Scholar
Butt, M., Hautli-Janisz, A., Lyding, V. (eds.): LingVis: Visual Analytics for Linguistics. CSLI Publications, Stanford (2020)
Google Scholar
Cashman, D., et al.: A user-based visual analytics workflow for exploratory model analysis. Comput. Graphics Forum 38(3), 185–199 (2019). https://doi.org/10.1111/cgf.13681, https://onlinelibrary.wiley.com/doi/abs/10.1111/cgf.13681, _eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1111/cgf.13681
Condoravdi, C., Crouch, D., De Paiva, V., Stolle, R., Bobrow, D.: Entailment, intensionality and text understanding. In: Proceedings of the HLT-NAACL 2003 Workshop on Text Meaning, pp. 38–45 (2003)
Google Scholar
DeKeyser, R.M.: The robustness of critical period effects in second language acquisition. Stud. Second. Lang. Acquis. 22(4), 499–533 (2000)
Article Google Scholar
El-Assady, M., et al.: lingvis.io - A Linguistic Visual Analytics Framework (2019)
Google Scholar
Van den Elzen, S., Van Wijk, J.J.: BaobabView: interactive construction and analysis of decision trees. In: 2011 IEEE Conference on Visual Analytics Science and Technology (VAST), pp. 151–160 (2011). https://doi.org/10.1109/VAST.2011.6102453, https://ieeexplore.ieee.org/document/6102453
Gienapp, L., Stein, B., Hagen, M., Potthast, M.: Efficient pairwise annotation of argument quality. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 5772–5781 (2020)
Google Scholar
Gold, V., Hautli-Janisz, A., Holzinger, K.: VisArgue - Analyse von Politischen Verhandlungen. Zeitschrift für Konfliktmanagement 3(16), 98–99 (2016)
Article Google Scholar
Gretz, S., et al.: A large-scale dataset for argument quality ranking: construction and analysis. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 7805–7813 (2020)
Google Scholar
Habernal, I., Gurevych, I.: Which argument is more convincing? Analyzing and predicting convincingness of web arguments using bidirectional LSTM. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1589–1599 (2016)
Google Scholar
Hasan, B.M.S., Abdulazeez, A.M.: A review of principal component analysis algorithm for dimensionality reduction. J. Soft Comput. Data Min. 2(1), 20–30 (2021)
Google Scholar
Hindalong, E., Johnson, J., Carenini, G., Munzner, T.: Abstractions for visualizing preferences in group decisions. Proc. ACM Hum.-Comput. Interact. 6(CSCW1), 49:1–49:44 (2022). https://doi.org/10.1145/3512896, https://dl.acm.org/doi/10.1145/3512896
Janier, M., Lawrence, J., Reed, C.: OVA+: an argument analysis interface. In: Computational Models of Argument (COMMA) (2014)
Google Scholar
Johnson, J.S., Newport, E.L.: Critical period effects in second language learning: the influence of maturational state on the acquisition of English as a second language. Cogn. Psychol. 21(1), 60–99 (1989)
Article Google Scholar
Kalouli, A.L., Crouch, R., de Paiva, V.: GKR: bridging the gap between symbolic/structural and distributional meaning representations. In: Xue, N., et al. (eds.) Proceedings of the First International Workshop on Designing Meaning Representations, Florence, Italy, pp. 44–55. Association for Computational Linguistics (2019). https://doi.org/10.18653/v1/W19-3305, https://aclanthology.org/W19-3305
Karttunen, L.: Some observations on factivity. Res. Lang. Soc. Interact. 4(1), 55–69 (1971)
Google Scholar
Katzav, J., Reed, C.: On argumentation schemes and the natural classification of arguments. Argumentation 18(2), 239–259 (2004)
Article Google Scholar
Keim, D., Andrienko, G., Fekete, J.-D., Görg, C., Kohlhammer, J., Melançon, G.: Visual analytics: definition, process, and challenges. In: Kerren, A., Stasko, J.T., Fekete, J.-D., North, C. (eds.) Information Visualization. LNCS, vol. 4950, pp. 154–175. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-70956-5_7
Chapter Google Scholar
Lu, Y., Wang, H., Landis, S., Maciejewski, R.: A visual analytics framework for identifying topic drivers in media events. IEEE Trans. Visual. Comput. Graphics 24(9), 2501–2515 (2018). https://doi.org/10.1109/TVCG.2017.2752166, https://ieeexplore.ieee.org/document/8037991, conference Name: IEEE Transactions on Visualization and Computer Graphics
Nairn, R., Condoravdi, C., Karttunen, L.: Computing relative polarity for textual inference. In: Proceedings of the fifth International Workshop on Inference in Computational Semantics (ICOS-5) (2006)
Google Scholar
Park, K., Park, M.K., Song, S.: Deep learning can contrast the minimal pairs of syntactic data. Linguistic Res. 38(2), 395–424 (2021)
Google Scholar
Peldszus, A., Stede, M.: An annotated corpus of argumentative microtexts. In: Proceedings of the First Conference on Argumentation, Lisbon, Portugal (2016)
Google Scholar
Plenz, M., Buchmüller, R., Bondarenko, A.: Argument quality prediction for ranking documents. In: Working Notes Papers of the CLEF 2023 Evaluation Labs, CEUR Workshop Proceedings (2023)
Google Scholar
Rahwan, I., Reed, C.: The argument interchange format. In: Rahwan, I., Reed, C. (eds.) Argumentation in Artificial Intelligence. Spring, Cham (2009). https://doi.org/10.1007/978-0-387-98197-0_19
Chapter Google Scholar
Reed, C., Walton, D.: Argumentation schemes in dialogue. In: Hansen, H.V., et al. (eds.) Dissens and the Search for Common Ground, pp. 1–11, Windsor, ON. OSSA (2007)
Google Scholar
Saurí, R., Pustejovsky, J.: Are you sure that this happened? Assessing the factuality degree of events in text. Comput. Linguist. 38(2), 261–299 (2012)
Article Google Scholar
Schmid, J., Cibulski, L., Hazwani, I.A., Bernard, J.: RankASco: a visual analytics approach to leverage attribute-based user preferences for item rankings. In: Bernard, J., Angelini, M. (eds.) EuroVis Workshop on Visual Analytics (EuroVA). The Eurographics Association (2022). https://doi.org/10.2312/eurova.20221072
Sevastjanova, R., El-Assady, M., Bradley, A., Collins, C., Butt, M., Keim, D.: VisInReport: complementing visual discourse analytics through personalized insight reports. IEEE Trans. Visual. Comput. Graph. 28(12), 4757–4769 (2022). https://doi.org/10.1109/TVCG.2021.3104026
Sevastjanova, R., Hauptmann, H., Deterding, S., El-Assady, M.: Personalized language model selection through gamified elicitation of contrastive concept preferences. IEEE Transa. Visual. Comput. Graphics 1–17 (2023). https://doi.org/10.1109/TVCG.2023.3296905, https://ieeexplore.ieee.org/document/10194961, conference Name: IEEE Transactions on Visualization and Computer Graphics
Simpson, E., Gurevych, I.: Finding convincing arguments using scalable Bayesian preference learning. Trans. Assoc. Comput. Linguist. 6, 357–371 (2018)
Article Google Scholar
Simpson, E., Gurevych, I.: Scalable Bayesian preference learning for crowds. Mach. Learn. 109(4), 689–718 (2020)
Article MathSciNet Google Scholar
Sperrle, F., Sevastjanova, R., Kehlbeck, R., El-Assady, M.: Viana: visual interactive annotation of argumentation. In: Proceedings of IEEE Conference on Visual Analytics Science and Technology (VAST), pp. 12 (2019). https://doi.org/10.1109/VAST47406.2019.8986917
Sperrle, F., Sevastjanova, R., Kehlbeck, R., El-Assady, M.: VIANA: visual interactive annotation of argumentation. CoRR abs/1907.12413 (2019). http://arxiv.org/abs/1907.12413
Stanovsky, G., Eckle-Kohler, J., Puzikov, Y., Dagan, I., Gurevych, I.: Integrating deep linguistic features in factuality prediction over unified datasets. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pp. 352–357 (2017)
Google Scholar
Toledo, A., et al.: Automatic argument quality assessment–new datasets and methods. arXiv preprint arXiv:1909.01007 (2019)
Van Eemeren, F.H., Grootendorst, R., Johnson, R.H., Plantin, C., Willard, C.A.: Fundamentals of Argumentation Theory: A Handbook of Historical Backgrounds and Contemporary Developments. Routledge (2013)
Google Scholar
Wachsmuth, H., et al.: Argumentation quality assessment: theory vs. practice. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pp. 250–255 (2017)
Google Scholar
Wachsmuth, H., et al.: Computational argumentation quality assessment in natural language. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers, Valencia, Spain, pp. 176–187. Association for Computational Linguistics (2017). https://www.aclweb.org/anthology/E17-1017
Wachsmuth, H., Werner, T.: Intrinsic quality assessment of arguments. arXiv preprint arXiv:2010.12473 (2020)
Walton, D., Macagno, F.: A classification system for argumentation schemes. Argument Comput. 6(3), 219–245 (2015)
Article Google Scholar
Walton, D., Reed, C., Macagno, F.: Argumentation Schemes. Cambridge University Press, Cambridge (2008)
Book Google Scholar
Warstadt, A., et al.: BLiMP: the benchmark of linguistic minimal Pairs for English. Trans. Assoc. Comput. Linguist. 8, 377–392 (2020)
Article Google Scholar

Download references

Author information

Authors and Affiliations

University of Konstanz, Universitätsstraße 10, 78464, Konstanz, Germany
Mark-Matthias Zymla, Raphael Buchmüller, Miriam Butt & Daniel Keim

Authors

Mark-Matthias Zymla
View author publications
You can also search for this author in PubMed Google Scholar
Raphael Buchmüller
View author publications
You can also search for this author in PubMed Google Scholar
Miriam Butt
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Keim
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mark-Matthias Zymla .

Editor information

Editors and Affiliations

Bielefeld University, Bielefeld, Germany
Philipp Cimiano
Heidelberg University, Heidelberg, Germany
Anette Frank
Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
Michael Kohlhase
Bauhaus-Universität Weimar, Weimar, Germany
Benno Stein

Rights and permissions

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zymla, MM., Buchmüller, R., Butt, M., Keim, D. (2024). Deciphering Personal Argument Styles – A Comprehensive Approach to Analyzing Linguistic Properties of Argument Preferences. In: Cimiano, P., Frank, A., Kohlhase, M., Stein, B. (eds) Robust Argumentation Machines. RATIO 2024. Lecture Notes in Computer Science(), vol 14638. Springer, Cham. https://doi.org/10.1007/978-3-031-63536-6_18

Download citation

DOI: https://doi.org/10.1007/978-3-031-63536-6_18
Published: 17 July 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-63535-9
Online ISBN: 978-3-031-63536-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Deciphering Personal Argument Styles – A Comprehensive Approach to Analyzing Linguistic Properties of Argument Preferences