How to Win Arguments

People make decisions every day or form an opinion based on persuasion processes, whether through advertising, planning leisure activities with friends or public speeches. Most of the time, however, subliminal persuasion processes triggered by behavioral cues (rather than the content of the message) play a far more important role than most people are aware of. To raise awareness of the different aspects of persuasion (how and what), we present a multimodal dialog system consisting of two virtual agents that use synthetic speech in a discussion setting to present pros and cons to a user on a controversial topic. The agents are able to adapt their emotions based on explicit feedback of the users to increase their perceived persuasiveness during interaction using Reinforcement Learning.


Introduction
The communication of opinions, along with different proand counter-arguments, is an important factor in the process of opinion building. However, people tend to get persuaded by far more than just the rational content of arguments. Presenting content-wise identical arguments in different ways, such as body language, appropriate gazing behavior as well as emotions can have a different effect on the audience's opinion towards the argument and overall stance of the topic [1,4,6] and, thus, the persuasive effectiveness of the conveyed content [25]. Especially in public speeches and political debates, it is not only important what is said (semantic content), but also how something is said. Looking at recent public speeches, especially by politicians, it appears that people are more likely to be influenced by the behavior or authority of speakers than by the content of the message they convey. A prominent example is the controversially Klaus Weber klaus.weber@informatik.uni-augsburg.de Niklas Rach niklas.rach@uni-ulm.de 1 Augsburg University, Augsburg, Germany 2 Ulm University, Ulm, Germany discussed Brexit, in which supporters have demonstrably put forward arguments with questionable validity. For instance, during the referendum campaign in 2016, it has been claimed that remaining in the EU costs around £ 350 million a week, which, however, disregards the fact that the UK had been granted a rebate since 1984 [7]. Although it is difficult to check what could have happened retrospectively if only logically valid arguments had been used, it still seems that people are not aware of the subliminal persuasive framing caused by their own emotions and those of the speakers.
Within this article, we present the ongoing research project EVA (Empowering Virtual Agents), in which we aim to investigate two different persuasive aspects of argumentation: (1) the effect of what is said and (2) the effect of how something is said. Thus, our approach includes logical argumentation that is focused on the content and order of the arguments (what to say) as well as subliminal persuasion employing non-rational argumentation, i.e. body language-based argumentation (how to say it). To this end, we present a novel multimodal dialog system in which two embodied virtual agents discuss a specific (controversial) topic through synthetic speech and emotions. The whatto-say component of the two agents is based on a logical argumentation strategy that was optimized beforehand in self-play based on objective quality criteria for effective argumentation [16]. In contrast, the persuasive power of the how-to-say component reflected by the agents' body language is adapted during the interaction with the user. In particular, the user may provide explicit feedback on whether he or she finds the presentation convincing, not convincing or has a neutral opinion about it. By allowing users to tune the how-to-say component of the agents, we enable them to explore the impact of subliminal cues on the persuasive power of arguments in order to raise awareness that people get influenced by adapted behavior.
In Sect. 2, we give an overview of the research of persuasion. Sect. 3 describes the overall approach, a new formalization of the merged learning approach and a thorough overview of the currently employed logical and non-logical policies, while Sect. 4 shows the results at the current stage of the research.

General Theory
The theory of persuasion goes back to the Greek philosopher Aristotle, who identified three means of persuasion, that are (1) logos (the logical and rational aspects, i.e., the content), (2) pathos (the emotional engagement between persuader and persuadee) and (3) ethos (the persuader's personality and character) [11].
Psychological persuasion models developed by Petty et al. [12] (Elaboration Likelihood Model -ELM) and Chaiken et al. [3] (Heuristic-Systematic Model -HSM) describe the influence of information processing on the result of persuasive messages. A persuasive message can be processed via two different cognitive routes, namely central and peripheral processing. Central processing focuses on the content of the message communicated by the persuader/agent, while peripheral processing focuses on the expression of the agent. However, people do not process information in isolation via the central or peripheral route [3]. Instead, peripheral processing is always carried out, in addition to which, if an elaboration threshold is reached, central processing also takes place. In this situation, the two processing paths are used with different intensities depending on the audience's "need for cognition" [12].

Non-verbal Signals and Persuasion
The persuasive effect of non-verbal signals in Human-Agent-Interaction has extensively been investigated. For instance, it has been shown that robots using non-verbal cues (body language, gaze, ...) are perceived as more persuasive than the ones not using them. Examples are given by Chidambaram et al. [4], who compared the effect of non-verbal cues (gesture, gaze, ...) to vocal cues showing a higher persuasive effect for non-verbal cues. Further, Ham et al. [6] observed that gazing increased the persuasive effect more than using gestures alone did. Also, it has been found by Andrist et al. [1] that practical knowledge and rhetorical ability can affect the persuasive effect. The EASI theory (Emotion As Source of Information, Kleef et al. [23]) states that both intra-and interpersonal influences of emotions exist, i.e. the persuadee's own emotions can affect the outcome of a persuasive message as well as the emotions of the persuader conveying the persuasive message. According to this theory, two different processes exist, namely inference and affective reaction. In addition to that, two moderator classes determine which of these two processes takes precedence: appropriateness and information processing. The latter one depends, similar to the ELM, on the motivation and ability of the persuadee to engage in thorough information processing (epistemic motivation). Studies have proven this theory and shown that people use the emotions of the source as an information channel when forming their own attitudes [24]. In accordance with this, DeSteno et al. [5] showed that persuasive messages are more successful if they are framed with emotional overtones that correspond to the emotional state of the recipient.
The whole argumentation of Brexit shows that the people's opinion is strongly driven by emotions and not just by the validity of arguments. There is also evidence from the literature that the perceived persuasiveness of arguments depends largely on emotions. We, therefore, felt encouraged to focus on emotions as a major component of subliminal persuasion.

Concept and Approach
In the following, we present an overview of our approach and provide a formal description of the overall concept at a high level (see Sect. 3.1) as well as a thorough discussion of the two employed policies: (1) Policy emotion for emotional behavior generation (see Sect. 3.2.2) and (2) policy argument for the argumentation strategy (see Sect. 3.2.1).
For every interaction step t, each agent selects an utterance (one after the other) as well as a corresponding emotion and presents it to the user based on its learned merged policy (see Def. 4). The user provides the agents with feedback, which is used to optimize the policy (here only emotion , see  Overview of our approach with two virtual agents 1 debating about a controversial topic employing a pre-trained logical strategy argument as well a dynamically learned behavior strategy emotion adapted to the user/persuadee based on user's explicit feedback how convincing an argument is perceived (convincing, neutral, not convincing). This system is an extension of the rule-based approach presented in [18] (including trained logical strategy) and the adaptation approach (without trained logical strategy) presented in [28] Set of emotions

Formalization
Formally, our approach can be described as Markov Game with i players (here: 2) as 5-tuple .I; S; A; R; T /, where I defines the set of players, S defines the merged state space, A the joint action space and R the joint reward function and T W S A S ! OE0,1 determines the transition function as follows [2]: Definition 1 (Merged State Space) Let S emotion be the sub state space in order to determine the next emotion and S argument the sub state space for determining the next argument, then the system's merged state space S is defined as Definition 2 (Joint Action Space) Let A emotion;i be the emotion sub action space and A argument;i the argumentation sub action space for player i 2 I, Further, let A i W= A emotion;i A argument;i be the respective merged action space for player i 2 I, then the joint action space is defined as

Definition 3 (Joint Reward Function)
Let the function R emotion;i W S A S ! R be the reward given for the emotional part and R argument;i W S A S ! R be the reward given for the rational part for player i 2 I.
Further, let R i W= R emotion;i R argument;i be the respective merged reward function, then the joint reward function R is defined as Definition 4 (Joint Policy) Let emotion;i be the emotion policy and argument;i be the argumentation policy for player i 2 I. Further, let i W= emotion;i argument;i be the respective merged policy with i W S ! D.A i / the distribution of possible actions given a state s 2 S, then the joint policy is defined as In the herein presented Markov Game, the agents select and execute their actions one after the other (and not contemporaneously). Thus, we define an awaiting action a w 2 A i for every agent i 2 I who is not in turn. Further we denote any state s t 2 S (respectively action a t 2 A i ) for agent i 2 I at time step t as s i t (a i t ). It is further notable that both agents have opposing goals, which is formally called a zero-sum game. The optimum joint policy corresponds to a Nash Equilibrium, which means that a change of any strategy i does not yield an advantage for the respective agent if the policy of the opposing player is kept stationary. It is important to note that the two state and action spaces clearly differ in their size and complexity. Since the argumentation policy argument;i should be able to differentiate between all available arguments, a realtime optimization (during the interaction with a user) is not feasible. On the other hand, research has yielded different objective quality criteria for logical argumentation which can be utilized in order to define an appropriate reward. Since this is not necessarily the case for the emotional policy emotion;i (due to subjectivity), it has to be adapted in real-time and directly to the user response. Hence, we split the learning process into two phases (pre-training and realtime) and optimize the two parts of the policy separately at the current state of our work. Both learning phases are discussed in more detail in the following subsections. The Algorithm 1 sketches the general procedure during the interaction.

The Policy
Since persuasion does not only depend on the content but also the way, how a message is conveyed, the joint policy employed by the agents contains two merged sub-policies, that are argument and emotion as formally (high-level) described in Sect. 3.1. In the following section, we give a more in-depth overview of the two policies separately.

Learning of the Argumentation policy argument
A significant amount of research has been conducted in order to formalize arguments and argumentation strategies in dialogues (see for example [15] for an overview). Our approach builds on these results and utilizes a dialog game framework tailored to persuasion in order to structure the interaction between the two agents. The respective dialog game can formally be described as tuple .L; D/, with L alogic for defeasible argumentation and D the dialog system proper [14]. Consequently, L encodes a formal representation of the available arguments (including their relations to each other), and D defines the rules for the interaction. This includes aspects like turn-taking, winning criteria and allowed moves in each state of the interaction.
Within this project, we utilize the dialog game for argumentation introduced in [13], since it was motivated by providing a flexible framework that still ensures consistent dialogs. The framework allows for five different speech acts which are claim( i ), argue( i , so j ), why( i ), retract( i ) and concede( i ) with i and j argument components in L. The available argument components f i g are encoded following the argument annotation scheme introduced in [21]. This choice is motivated by the ambition to combine the presented system with existing argument mining approaches to ensure flexibility of the system in view of the discussed topics. The annotation scheme allows for three different types of components (Major Claim, Claim, Premise). It is important to note that a Claim component in the argument structure is not related to the claim speech act in the dialog game since they are both part of separate frameworks. In addition to the three types of components, the argument annotation scheme also allows for two relations between components that are support and attack. Each component apart from the Major Claim (which has no relation) has exactly one unique relation to another component. We denote the set of arguments that can be constructed from the annotated set as Args. The arguments arg i 2 Args have the form arg i = . i , so j ) if i supports j and arg i = . i , so : j / if i attacks j . Each argument arg 2 Args refers to one of two existing stances 2 f+; −g of the topic. For the sake of simplicity, the Major Claim is defined as arg 0 = 0 . The complete setup including a discussion of an annotated structure and a preliminary study on how the resulting artificial argumentation was perceived by a human audience was presented in [17].
In the next step, the aim is to explore the freedom within the dialog framework to optimize the logical strategy of the agent. In order to do so, we focus again on a formalization of the problem as a Markov Game which allows addressing policy optimization as a multi-agent Reinforcement Learning task. The advantage of the Markov Game formalism is that it does not depend on a pre-defined strategy or additional annotated training data, assuming that a formal reward function is given. Moreover, it allows the agent to explore different (and maybe unknown) strategies instead of imitating a human policy. The formal description can be derived from the general formalism in Sect. 3.1 by excluding the emotional aspects from the formal description. This is possible since the introduced formalization assumes a separation of the two aspects in the action and state space as well as in the reward function and the policy. The utilized formal translation of a generalized dialog game for argumentation into a Markov Game was extensively discussed in [16]. The reward functions R argument;i of both agents i 2 I are currently based on the winning criterion of the underlying dialog game for argumentation since it provides a formal indicator for the outcome of the interaction. However, since also the perception of the content is highly subjective, our ongoing work is focused on including user feedback into the reward and combining it with the formal aspects. The complete pipeline, including two argument structures, the dialog game for argumentation, optimized policy argument and a virtual agent were presented in [18].

Learning of the Emotion policy emotion
The sub policy emotion is -in contrast to the argumentation policy argument -learned during interaction with the user. In order to enable the agents to adapt their behavior, we employ Reinforcement Learning as it is a suitable tool for real-time adaptation in human-agent scenarios as shown in [20,26,27].
The policy emotion depends on the argumentation policy argument since the type of argument (or at least the content) defines what kind of emotion is the right one to be chosen. Consequently, the sub-state s emotion is derived from the argumentation sub action a argument 2 A argument as described in the following.

Definition 5 (State Space S emotion ) Let a i
argument;t = .arg/ 2 A argument;i be the sub argument action with arg 2 Args chosen at time step t for agent i 2 I, stan W Args ! f+; -g the stance and rel W Args ! fattack; supportg the relation of argument arg. Further let score W Args ! OE−1,1 be the normalized compound score of the sentiment analysis of an argument. Then the sub state s emotion is defined as follows considering it is player i0s turn: s emotion W= .i; stan.arg/; rel.arg/; score.arg// (5)

Definition 6 (Action Space A emotion )
The discrete action space A emotion consists of employed emotions, such as happy, disappointed, sad, angry, which can be both displayed by facial expressions as well as gestures.
To this end, we employ the user's feedback (convincing, neutral, not convincing) to compute a prediction of the current user's stance during interaction as introduced in [28]. The employed prediction model is based on bipolarweighted argument graphs (BWAG) that assigns a weight w i 2 OE0,1 to each argument arg i 2 Args, which is used for computing the argument's strength s i considering the strengths of its child arguments in the argument graph (see Fig. 2).
The strength of the root argument s 0 is finally taken as the predicted user's stance t (Def. 7) at time step t. This approach has the advantage that the underlying argument structure is taken into account for the learning process [28]. Consequently, the reward function R emotion is defined as: Definition 7 (Reward Function R emotion ) Let f t W Args ! f0.0; 0.5; 1.0g be the user feedback function defining how convincing an argument is perceived by the user at time step t -ranging from 0 (not convincing) to 1 (convincing). Further let i t W .f 1 ; :::f t / ! OE0,1 a function that maps the feedback signals to a prediction of the current user's stance until time step t (see Weber et al. [28] for more information about the prediction model) with respect to agent i 2 I, then the reward R emotion;i;t at time step t is defined as: with 0 = 0.5 While employing the user's feedback directly as a reward signal allows for quick adaptation, using a prediction of the user's stance allows for learning a more fine-grained strategy depending on the actual outcome of the debate. This is because the prediction model allows the agents to observe their current position with respect to the opponent. The advantage of that kind of approach is that an agent can learn to take over the opponent's strategy when it notices that it currently holds the worse position. Further, argument-specific and structure-specific information can be used to provide more fine-grained information for learning. In addition to that, behavior learning can be combined with fine-grained logical strategies [28].
In the next step, we aim to explore the effect when adapting emotions to the listener in order to both verify that an adaptation leads to higher persuasive effectiveness and to raise awareness of the subliminal influence of adapted emotions.

Dialog Example
To give the reader a better understanding of the overall system, we sketch in the following a detailed example of an interaction between both agents and the user.
The herein presented dialog between the two agents concerns the claim 0 = Marriage is an outdated institution 2 . The stances are as follows: (i) Left agent: 0 ! stan = (ii) Right agent: : 0 ! stan = For the sake of simplicity, we assumed that the user is against the claim (: 0 ) and finds the right agent (agent 2) convincing, if attacking arguments are presented with an angry emotion. The left agent (agent 1) is found convincing, if the sentiment of the argument is negative. In all other cases, a negative reward is given. Fig. 3 shows the example dialog between the two agents.
The first argument given by agent 1 is presented with a neutral emotion and, therefore, not convincing leading to a negative reward for the agent. The next argument of agent 2 is an attacking argument and presented along with an angry emotion and, therefore, found convincing. Why speech acts do not require feedback since they do not contain any argument, therefore, the reward is always zero. The fourth argument is found not convincing due to the nature of our fictitious user, while the last argument given by agent 1 is found convincing again since the correct behavior is chosen.

Evaluation Results
In the following, we highlight the results of our research at the current stage. As we split the learning process into two phases, we discuss each of them separately.
Argumentation policy. For the argumentation strategy, the combination of argument structure and dialogue game in a multi-agent setup was evaluated in a user study [17] by comparing transcripts of artificial discussions with humangenerated ones. In the course of the survey, each participant answered questions regarding the logical consistency of the dialogs, the argumentation strategy of the agents and the naturalness of the interaction on a five-point Likert Scale. The results showed that the agent-agent dialogues were perceived as logically consistent but also a significantly higher perceived naturalness of the interaction for human-generated dialogues. We concluded that the utilized methods are generally suitable for our task (as they lead to cogent interactions) but that the perceived low naturalness indicates room for improvement in view of the perceived quality of the argumentation.
In the next step, the Markov Game formalism was applied in order to explore the freedom of choices for the Fig. 3 Example dialog between the two agents showing different arguments presented with emotions and the given feedback of an exemplary fictitious human user agents that is provided by the utilized dialogue game framework. We evaluated the approach in a proof of principle setup with a reward function for which the optimal policy is known [16]. We trained a policy for both sides of the discussion by means of RL for ten randomly generated argument structures in order to exclude topic dependencies of the results. The trained policies were evaluated against the known optimal policy based on probabilistic rules and shown to lead to the expected outcome. We concluded that for the investigated case, the optimal policy could be found by means of the applied techniques. Fig. 4 Simulation results including 95% confidence intervals depicting the cumulative prediction t showing a continuous increase over time even for high noise (taken from [28]) Emotion policy. We have evaluated the adaptation of the emotional policy in a simulative setup showing that the agents are able to increase their perceived persuasive effectiveness ( t , Fig. 4) by means of using the provided human feedback signals even for high noise (as user feedback is not deterministic) [28].
Further, we evaluated the prediction model in a user study with 48 participants in order to verify its practical potential, accuracy, and validity. For this purpose, the participants were asked to listen to an agent, who brought pro and counter arguments for visiting a hotel, and asked to give direct feedback whether they found an argument convincing. This feedback was used to calculate the prediction t of the user's stance (see Definition 7). In a post-study questionnaire, the participants were asked whether or not they like to visit the hotel. We then evaluated to what degree the predicted user's stance and the subjective decision match. Table 1 and Fig. 5 summarize the results [28], which show that the prediction model of the user's stance is very accurate making it a suitable measure for rewarding the agents during the interaction, especially in the herein presented ap-

Discussion of Limitations
There are two (potential) limitations that need to be discussed for the sake of completeness.
Argument content not considered in the emotion policy. In Sect. 3.2.2, we presented the adaptation approach of the emotion policy emotion . The RL state space (Definition 5) was defined by a triple consisting of the stance, the relation and the sentiment score of the selected argument. So, at the current stage of this work, the emotional policy is quite independently learned from the argumentation policy and the content of the argument is not considered at all. However, there might be an interaction between the content of the argument and the expressed emotion. The issue, however, is that real-time adaptation requires a small state space [27]. Therefore, we limited the state space to three dimensions. To include at least some content-related information, we added the sentiment score, which, however, just describes the negativity of the argument and not the direct content. Assigning a single score to arguments describing the content is to our best knowledge not possible. Therefore, it would be necessary to include all arguments in the state space so that the agent learns a policy that takes the content of the argument into account. This would exponentially increase the state space and is therefore not suitable for real-time adaptation. Another possible solution could be to add an assessment score describing the quality of the argument by employing automatic quality assessment techniques of arguments as recently proposed by Toledo et al. [22]. In summary, additional research and studies are needed to overcome this limitation or to show that the proposed state space contains sufficient information to learn a consistent emotional policy successfully.

Perception of arguments depends on the agent's focus.
Within the experiment to validate the prediction model, all arguments were presented by a single agent in a userdirected manner (uni-directional persuasion [8]). However, there is evidence from the literature that multiple agents overall appear to be more persuasive compared to single agents in case of maintaining a consistent argumentation strategy. This effect is notably increased by using vicarious bi-directional persuasive agents [9] (in which one agent tries to persuade the other agents and indirectly persuades the observer of the dialog [8]). Consequently, one might wonder whether the evaluated prediction model performs equally well within the proposed multi-agent scenario since the persuasive outcome could be different. However, we argue that since our model depends only on direct user feedback and since each argument (not just the whole debate) would be perceived differently in the first place, the user feedback received would be different, and thus the predicted stance of our model would implicitly take such effects into account. Nevertheless, it would be worth conducting a follow-up study to verify this assumption.

Summary and Outlook
Within this work, we have provided an overview of the EVA project that aims for synthesizing and combining different aspects of argumentation. Our proposed system is comprised of two interacting agents that are represented by virtual avatars. The argumentative aspects, addressed in this paper, cover (1) the learning of an optimal logical strategy within a dialog game framework from objective optimization criteria and (2) a real-time adaptation of the emotional presentation of arguments based on explicit user feedback and a computed prediction of the current user's stance over time. Our results showed the predictive power of the proposed prediction model and that the trained logical policy performs comparably well as the optimal policy based on probabilistic rules. In our future work, we will explore different techniques to implicitly estimate the persuasive effect of the presented arguments on the user without explicit feedback based on the approaches discussed in [19] and the prediction model presented in [28]. Since argumentation as a whole is highly subjective [10], we aim to extend the system to allow the agents to adapt both investigated aspects (how and what) to the user simultaneously in order to determine the most effective combination of the different aspects. Thus, we will include a combination and simultaneous learning of both aspects of the strategy. To this end, we will explore different approaches, including function approximation and finetuning of pre-trained strategies. Finally, we aim to com-pare the respective outcomes with the ones achieved by the current separated approach.
Funding Open Access funding provided by Projekt DEAL.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4. 0/.