Combining Social Choice Theory and Argumentation: Enabling Collective Decision Making

Argumentation-based debates are mechanisms that a group can use to resolve conflicting opinions and hence reach agreement. They have many potential applications in on-line communities and other open environments. In this paper, we provide computational infrastructure to support argumentation-based debates, in particular focusing on the problem of how participants in a debate can reach agreement about the outcome of the debate, given all the statements that have been made. Our approach makes it possible to represent arguments that are put forward by the participants in a debate, allows both positive and negative relationships between the arguments to be represented, and makes it possible for participants to express opinions about both the arguments and the outcome of the debate. Our main contribution is to provide a novel method—indeed the first method—for computing the collective decision that emerges from the combination of a set of arguments and a set of opinions about whether the arguments hold or not. To do this, we carry out a formal investigation of a family of aggregation functions. This family starts with a function that is firmly rooted in the social choice literature, and is extended with functions that are more oriented towards the use of argumentation. We prove that to ensure that the collective decision is coherent, a property that we think is essential, an aggregation function needs to take into account the dependencies between arguments. We also provide an empirical analysis of the performance of our approach to reaching a collective decision, showing that a collective decision can be reached for debates, of the size that one currently finds online, in reasonable time.


Introduction
Across the world there is an increased interest in the process of using technology to provide a route for the greater engagement of citizens in the governance of their communities. The idea is that thanks to the use of so-called e-governance or, more specifically, e-participation systems, individuals will be able to have a greater say in the way that they are governed (Weerakkody and Reddick 2012), and decisions taken by their elected representations will more closely reflect the views of the represented (Dawes 2008). For example, both Barcelona and Reykjavík municipalities have opened up some aspects of their policy making to citizens through the use of online portals. These portals, Decidim Barcelona (2017) and Better Reykjavík (2017), respectively, allow individual citizens to put forward policy proposals, state their support for proposals made by their peers, and debate the pros and cons of proposals made by themselves and by others.
Decidim and Better Reykjavík, by their restriction to the cities in question, focus on local issues and have limited reach into the broader communities. However, there is no reason why such experiments in participatory democracy have to be geographically limited. Indeed, the ambitious Parlement et Citoyens project in France (2017) aims to scale this kind of activity up to the national level, allowing citizens and deputies to collaboratively draft proposals for legislation.
All these efforts can be seen as extensions of earlier e-governance efforts, such as the UK's online petition site (Petitions 2017), along the lines of online collaboration platforms intended to support small group discussions. The UK petition site allows citizens and residents to request that parliament consider a topic. It is a one way mechanism-if a petition collects 10,000 signatures, then the government responds. If a petition collects 100,000 signatures (as did the recent petition to deny Donald Trump a state visit), then parliament debates the issue. However, there is no ability for citizens to directly discuss issues amongst themselves, nor is there a facility to engage in debate with lawmakers. It is this ability to engage in structured debates that, for us, is the key element in Decidim, Better Reykjavík and Parlement et Citoyens. The structure comes from an initial proposal, which participants can then subject to scrutiny, offering arguments for and against the central issue. This is a structure that they have in common with sites like Quoners (2017) and consider.it (2017) which are not tied to a particular institution, and tools like the Deliberatorium (Klein 2012(Klein , 2017.
The work described in this paper is inspired by these kinds of e-participation systems, and explores computational mechanisms for evaluating the output of these systems. In other words, we define and formally analyze a computational mechanism that can take the output of an e-participation system-a number of arguments about a proposal and opinions about whether or not those arguments have value-and establish what the balance of opinion is. Indeed, the scenario that we investigate is somewhat more general than that supported by the systems mentioned above. That is because existing e-participation systems are limited to either providing a list of arguments for a proposal and a list of argument against, so that there are no relationships between the list of arguments, or providing a forum-like setting where arguments are structured in a tree. In contrast, we allow for a more general discussion.
In particular, in this paper we do the following: -We introduce a novel, formal framework-which we call a target-oriented discussion framework-that can support discussions about whether some proposal should be accepted or not. (The proposal is the "target".) This allows participants in a debate to put forward arguments for and against the target, and to indicate the relationships between those arguments. (If argument a is in favour of the target and argument b is against it, then a and b may conflict, and our framework allows this to be recorded). -We provide the means for individuals to express their opinions about the proposal and the arguments that have been put forward in our framework-both arguments that they have put forward and arguments put forward by others-along with a method that can assess whether or not the set of opinions as a whole is reasonable. This notion of reasonableness is established by the formal notion of a coherent labelling. A coherent labelling can be thought of as a relaxed variant of the standard argumentation notion of a complete labelling (Baroni et al. 2011) which provides further flexibility in expressing opinions. 1 -We introduce and formally evaluate a family of aggregation functions that take a set of arguments and opinions about those arguments and return the collective decision about the target. We investigate the properties of these aggregation functions by borrowing from, and extending, the adaptation of classical properties from social-choice theory to the domain of argumentation that was carried out in Awad et al. (2017). We find that the aggregation functions that we introduce span a range of properties-summarised in Table 3-illustrating the trade-offs between those properties. We prove that two of the functions guarantee that the outcome satisfies the property of coherent collective rationality, meaning that they generate a coherent labelling, so that the collective opinion from the discussion is coherent-and this is the case even when aggregating individual opinions that may not be coherent.
As the terminology we have used so far suggests, our work draws both on argumentation theory and social choice theory. We consider that the statements made by participants in discussion are arguments. That is they are statements that include a conclusion and the reason why that conclusion is considered to hold. (However, like much work in argumentation (Dung 1995;Modgil and Caminada 2009), we will deal with them at a purely abstract level, meaning that we treat them as if they were atomic objects with no internal structure.) As in standard argumentation theory (Dung 1995), we consider that arguments can conflict, that they attack each other, for example if their conclusions are contradictory, but we also consider that arguments may defend one another, as in Cayrol and Lagasquie-Schiex (2005b). Unlike most work in argumentation theory, we are not directly concerned with computing the acceptability of the arguments in the discussion using some version of the standard semantics (Baroni et al. 2011;Baroni and Giacomin 2009). Rather, we are interested in establishing which arguments the participants in the discussion consider to hold, 2 and doing this is where the ideas from social choice theory come in.
We consider the problem of merging opinions about arguments in a debate to be an instance of collective decision-making as studied in social-choice theory. Given a set of agents, and a set of arguments about a topic, where each agent may have an opinion about whether or not the arguments hold, we are interested in how the agents can, as a group, reach a decision about the topic. To deal with this situation, we consider a family of novel functions which aggregate the opinions of the agents to compute the overall opinion on the topic. Therefore, our work tackles the same problem that Awad et al. posed in Awad et al. (2017), and we also encode agents' opinions about arguments using a set of non-binary labellings just as in Awad et al. (2017). However, our work takes an important step beyond (Awad et al. 2017) since we establish the overall labelling over the set of arguments without assuming independence between arguments as Awad et al. (2017) does (and as the judgement aggregation and preference aggregation literature in general does). Instead we adhere here to the important observation made in Awad et al. (2017), where the authors note that assuming independence is questionable because (as noted above), it is natural for a set of arguments to have support and attack relationships between them. Since these relationships exist, it is logical for the process of merging opinions about arguments to take these relationships between the arguments into consideration in some form. That is what two of our aggregation functions do, guaranteeing that the resulting aggregated opinion is coherent, that is, in some sense, free of contradictions. Coherence, in the sense we use the term, is a weaker condition than the conflict-freeness that is the standard minimum condition in argumentation theory, and our future work will look to establish whether other aggregation functions can attain a conflict-free set of opinions.
From a social choice perspective, it is important to note that unlike in the literature on judgement aggregation and preference aggregation, we do not impose the condition that opinions satisfy any property from which our aggregation functions can benefit to guarantee collective rationality. The rationale for this is clear, and the same as the rationale for not insisting on any conditions on the set of labellings that are given to the arguments by participants-since the agents that are involved in the discussions are humans, we cannot assume that they will have rational opinions. That is because we know that humans are frequently inconsistent when expressing their opinions, often contradicting themselves. We therefore believe that not assuming rationality is essential for our aggregation operators to be capable of being used in realistic settings. Organisation. The paper is structured as follows. Section 2 surveys related work. The next two sections characterise, Sect. 3, and formalise, Sect. 4, our novel multi-agent discussion framework. Then, Sect. 5 details both the decision problem that we study and the desired properties of aggregation functions; Sect. 6 introduces a family of novel aggregation functions and studies their social-choice properties; Sect. 7 provides an algorithm for computing the collective decision of a discussion framework using the functions from Sect. 6 and tests how long it takes to compute a collective decision for will differ from those that the standard semantics will generate. One might therefore consider the results irrational in some sense. However, as we discuss at length below, they are rational in the sense of (some aspects of) social choice theory. And we believe that highlighting the differences between the results as established by argumentation and the results as established by social choice theory, as our work will do in the long term, will be to the benefit of both those who study argumentation and those who study social choice theory. realistically-sized discussion frameworks; and Sect. 8 draws conclusions and plans future research.

Related Work
We identify several broad research areas with connections to the topics that we discuss in this paper. These include tools for online discussion, computational argumentation, and social choice theory.

Tools for Online Discussion
As mentioned above, we see this work as being inspired by work on online discussion forums such as Decidim Barcelona (2017), Better Reykjavík (2017) and Parlement et Citoyens (2017), where participants can carry out a structured discussion around some topic, typically a policy proposal. These particular tools just allow participants to offer arguments for and against a proposal, and only in the context of a specific institution. Other approaches have extended the scope of these tools. One direction is in developing tools that are not tied to a specific institution. In this category of noninstitutional tools we find Quoners (2017) and consider.it (2017), which we mentioned above, and Appgree (2017) and Baoqu (2017) where the main focus is on scalability-making the systems fit for use by large numbers of participants. Another direction is that of allowing participants to do more than just comment. Here we have the example of Jackson and Kuehn (2016) and Loomio (2017), where participants can both comment on proposals, albeit in an unstructured way, and also vote on them. What distinguishes our work from all of these approaches is that we aim to support discussions that are more than just structured-they are argument-based, and we take the interaction between the arguments into account. 3 There are other approaches that allow for structured argument-based discussions. Most notable here is Klein's work on the Deliberatorium (Klein 2012;Klein and Convertino 2015) which allows for the presentation of arguments and their interactions. The Deliberatorium is part of a long line of work that allows to structure reasoning about complex scenarios in terms of arguments for and against options. Other work in this line is (Carr 2003;Reed and Rowe 2004;Suthers et al. 1995;Van Gelder 2003), where the focus is more on drawing the relationships between arguments as a means of helping people understand the scenarios. Our work differs from these approaches in its attempt to provide computational methods to summarise the information that has been put forward. In other words, our focus is on using the results of debate as input to a computational process, rather than providing support for the debate itself. In that sense our work could be viewed as a post-processing stage that could be applied in conjunction with any of the tools to support structured discussion.

Computational Argumentation
Computational argumentation (Rahwan and Simari 2009) has a lengthy history within artificial intelligence. At the time of writing, it is hard to overstate the influence of the work of Dung (1995) which both introduced the idea of studying argumentation at the abstract level, that is without considering the structures from which arguments are constructed, and the idea of using argumentation as a way of establishing a consistent viewpoint from an inconsistent set of data. 4 Dung (1995) provided a number of methods-"semantics" as they are called-to extract a consistent set of arguments from a set of arguments that conflict with one another, and inspired much subsequent work on abstract argumentation systems (Baroni and Giacomin 2009;Modgil and Caminada 2009;Vreeswijk 1997). This includes work on bipolar argumentation (Amgoud et al. 2008;Cayrol and Lagasquie-Schiex 2005b), which includes a "support" relation between arguments.
However, work on abstract argumentation is only one aspect of work on argumentation. In fact, Dung (1995) was pre-dated by work that looked at decision-making as a process of putting forward reasons-arguments, though they were not called that at the time-for and against particular conclusion (Fox and Bardhan 1980). This approach was then refined into systems of argumentation such as (Fox et al. 1993) and (Krause et al. 1995). Such systems were precursors of work where the internal structure of an argument is important: logic-based argumentation (Besnard and Hunter 2001), assumption-based argumentation (Dung et al. 2006) and structured argumentation systems such as aspic+ (Modgil and Prakken 2013), and DeLP (García and Simari 2004). In these more subtle forms of argumentation, the focus is often still on establishing consistency-the difference with abstract argumentation is just that they don't consider arguments as primitive objects, rather arguments are constructed from sentences in some language.
Another line of work in computational argumentation, separate from that on establishing consistency, is that on argument accrual (Besnard and Hunter 2001;Cayrol and Lagasquie-Schiex 2005a;Fox and Bardhan 1980;Prakken 2005;Verheij 1995). Accrual involves the "summing up" of arguments, with the idea of establishing the strongest argument, sometimes in the face of arguments for and against some option, typically with the aim of being able to decide between some alternatives.
The work mentioned above uses argumentation as a mechanism for a single entity to come to a conclusion. However, as Sycara (1990), Walton and Krabbe (1995) and others have pointed out, argumentation is also a natural mechanism for multiple entities to use to reach consensus on some point. As a result, argumentation has been used (Amgoud et al. 2000;McBurney and Parsons 2009) in multiagent systems as a mechanism for rational interaction (McBurney 2002) for a particular meaning of "rational". That is "rational" in the sense that each stage in the interaction is supported by well-founded reasons. Here we build upon this prior work in rational interaction. Our approach allows agents to put forth arguments about some topic under discussion, be them either in favour or against the topic.
Our work connects to several of these themes in argumentation. First, since we are interested in arguments from a number of participants, our work is clearly related to the use of argumentation in multiagent interaction. As we will see below, just like (Amgoud et al. 2000) and subsequent work, we assume a particular protocol for arguments to be posed, and we are interested in being able to compute the outcome of a discussion taking into account the arguments put forward by multiple participants.
Second, our work connects with the idea of argumentation as a means of extracting a coherent view from a number of conflicting arguments. One, commonly used, approach to doing this is the labelling approach (Baroni et al. 2011), which attaches the labels in (for arguments that should be accepted), out (for arguments that should be rejected) and undec (where the status cannot be decided). In this paper we borrow the idea of the labelling, but rather than finding one or more consistent labellings from the relations between arguments, we allow participants to indicate which labels they think apply to which arguments. In other words, we take participants' votes on what label should apply to what argument as input and from them compute a consensus labelling, where the consensus labelling need only be, in our terminology, "coherent". Coherency is formally defined in Sect. 4.3, but informally we can say that a labelling is coherent if every argument that is labelled in has more more arguments for it than against it, and if every argument that is labelled out has more arguments against it than for it. Coherency is a weaker requirement than the consistency applied by the standard approaches in argumentation (Baroni et al. 2011), thus setting our work apart from that on merging argumentation systems, for example (Coste-Marquis et al. 2007), which looks to construct one or more consistent merged labellings from several different consistent labellings.
Considering the input labellings as votes from human participants places our work in close relation to that of social argumentation (Leite and Martins 2011), though, unlike that work we take as input votes on the status of arguments rather than votes on the strength of the relation between arguments.

Social Choice Theory
Given a a set of alternatives and a set of agents who possess preference relations over the alternatives, social choice theory focuses on how to yield a collective choice that appropriately reflects the agents' individual preferences (Aziz et al. 2017). With this aim, social choice theory has extensively explored many ways of aggregating agents' individual preferences (Gaertner 2009). Since there is a consensus in the literature on the desirable properties that a "fair" way of aggregating preferences should satisfy (e.g. no single agent can impose their view on the aggregate; if all agents agree, the aggregate must reflect the agreement; etc.), aggregation functions can be characterised and compared in terms of the desirable properties they satisfy. Notice though that social choice theory counts on multiple negative results, namely impossibility results showing the incompatibility of certain sets of desirable properties (e.g. Arrow's famous impossibility theorem Arrow and Sen 2002).
The work in this paper is in the vein of Awad et al. (2017). There, the authors pose the very same problem that we tackle here: given a topic under discussion and a set of agents expressing their individual opinions about the arguments in the discussion, how can the agents reach a collectively rational decision? Likewise Awad et al., we consider that reaching a collective decision is a judgement aggregation problem in which the aggregation of opinions must satisfy desirable social choice properties. A further similarity with (Awad et al. 2017) stems from the way we encode opinions (subjective evaluations). Indeed, notice that while in judgement aggregation each proposition may take on one of two values (True or False), here, when aggregating labellings, each argument can take on one of three values (lin, out, and undec). Therefore, aggregating labellings, as we do it here, has more in common with non-binary evaluations (Dokow and Holzman 2010).
Notwithstanding the similarities with Awad et al. (2017), there are several important differences with respect to that work and to the judgement aggregation literature as a whole. First of all, and very importantly, we do not assume independence between arguments as a fundamental postulate as is the case in Awad et al. (2017). As admitted by Awad et al., the necessity of independence is questionable because of the dependencies between arguments that come already encoded in the form of relationships such as attack. Despite that, they opt to stick with independence to keep open the possibility of proving strategy-proofness. Thus, they follow the usual methodology in judgement aggregation, though they do not establish the relation between independence and strategy-proofness. Indeed, independence is a fundamental property in the judgement aggregation literature because of its theoretical value in proving strategy-proofness and strategic manipulation. If the independence criterion is not satisfied, then the function aggregating judgements is not immune to strategic manipulation (Dietrich and List 2007). However, independence is not always upheld. On the one hand, from a theoretical point of view, independence is regarded as too strong a property, since, together with mild further conditions, it implies dictatorship . Furthermore, it is also considered as not very plausible (Mongin 2008). Hence, the theoretical and computational benefits of relaxing independence have been subject of much research (see e.g. Dietrich and Mongin 2010;Lang et al. 2016;Mongin 2008;Pigozzi et al. 2008).
Against this background, and given that dependencies do exist between arguments, our work departs from and goes beyond (Awad et al. 2017) by dropping independence. Thus, the aggregation functions that we introduce in this paper exploit dependencies between arguments and combine agents' opinions to yield an aggregated opinion. To the best of our knowledge, we are the first to take this step in a multi-agent argumentation context 5 .
A second, major difference has to do with the approach chosen to achieve collective rationality. Here we focus on designing novel aggregation functions that exploit dependencies between arguments to ensure collective coherence. Instead, Awad et al. are concerned with characterising the restrictions that are necessary so that the plurality rule, a well-known voting function in the literature, produces collectively rational outcomes. Notice that Awad et al. study social choice properties satisfied by the plurality rule by adapting various classical social-choice theoretic properties (see e.g. Arrow and Sen 2002; Arrow et al. 2010) to the argumentation domain. Here we borrow some of those properties to study our aggregation operators. Nonetheless, since some of those properties assume independence and we do not, we define further social choice properties that take into account dependencies between arguments.
More recently, in the intersection of social choice theory and argumentation, we find the interesting work in Rago and Toni (2017). Similarly to our work, the QuAD-V framework in Rago and Toni (2017) allows pro and con arguments (attackers and defenders in our terminology) and agents' votes over arguments (labels). Nonetheless QuAD-V does not allow arguments to be attackers and defenders at the same time. Although Rago and Toni (2017) propose the QuAD-V algorithm to determine a collective decision from multiple opinions by exploiting the dependencies between arguments, their goal is rather different from ours. Thus, they focus on the debate procedure (opinion polling) to ensure that, at the end of the debate, the agents contribute with individually rational opinions, a weaker version of our notion of coherent labelling. Instead, our focus is the design of aggregation functions that satisfy desirable social choice properties, particularly collective rationality (strict rationality in Rago and Toni's terms), without requiring agents' individual rationality. Along this line, notice also that the social choice properties of the QuAD-V algorithm are not investigated.
Finally, notice that unlike the literature on judgement aggregation and preference aggregation, in this paper we will not impose any particular properties on opinions from which our aggregation operators can benefit to guarantee collective rationality. Note that this is the case, for instance, for some aggregation functions in the judgement aggregation literature. For example, among distance-based aggregators, the Kemeny rule (Endriss and Moulin 2016) only considers consistent judgement sets, and hence disregards those which are not, and premise-based aggregators (Endriss and Moulin 2016) typically make assumptions on the agenda to guarantee consistency and completeness. In contrast to that literature, some of the aggregation operators introduced in this paper guarantee collective rationality independently of opinions' properties. As discussed above, the rationale for this is clear: we must disregard rationality when humans are involved in debates because their opinions may show contradictions and inconsistencies.

Introducing our Discussion Framework
Overall, we consider a situation where several individuals try to reach some consensus on a given issue. We refer to this issue or topic as the discussion target. During the discussion process, individuals provide arguments in favour or against this topic (or other arguments) in an orderly manner. Notice that although the example used throughout this paper considers a norm as the topic under discussion, this need not be the case. Indeed, we can imagine any of the dialogues discussed in Parsons et al. (2003), for example, to be discussions about a target which is the subject of the first statement to be made in the dialogue. Putting forward arguments, which may either be directed towards the target, or to arguments that have previously been put forward, is one way in which participants in the discussion can make their points of view known. In addition, participants are able to express their opinion on the target as well as by indicating which arguments they find acceptable or not. Next Sect. 4 is devoted to formalise this setting, which, as described in Ganzer-Ripoll et al. (2017a), we name it target-oriented argumentation framework.
We admit this is clearly a rather restricted notion of a discussion, not least because of the restriction to a single target, and many real-world discussions would not be encompassed by it. Furthermore, as we shall see in Sect. 4, what we formalise is even simpler, because we insist that any statement made after the target is an argument, and this argument has to relate to the target and/or to previous arguments. However, despite this simplicity, a target-oriented discussion allows more complex discussions than any of the existing, implemented, discussion frameworks discussed above. Thus, while we may need to extend the target-oriented discussion framework if we want to capture the full richness of human discussions, what we have here is already a considerable step beyond what currently exists.
Within a target-oriented argumentation framework, we distinguish two relationships between arguments: one argument can be for another argument, or it can be against another argument. These possible relationships between arguments are those discussed in Besnard and Hunter (2001) 6 Notice that for and against relationships are binary and directed. Moreover, they are mutually exclusive. In addition, in order to allow participants in a discussion to show their opinion of existing arguments, we make use of the notion of labels for arguments. Whereas in standard argumentation, labels are derived from the structure of the set of arguments (Baroni et al. 2011), in our approach a set of labels are assigned by each participant in the discussion. Every label is either in, out or undec. Participants assign an in label to the target or an argument in order to indicate that they accept it. Conversely, they assign out to signal rejection. Finally, an undec label denotes undecision, which may be related to two different situations. Firstly, this label can be used to indicate a participant is doubtful about whether to opt for one of the two options (i.e., in or out). Secondly, uncertainty may also derive from situations where participants simply miss the opportunity of assigning a label (or, in other words, providing their opinion about the target or an argument). Such situations seem to be rather realistic in human debates, as we can hardly expect participants will label absolutely all discussion elements.
Once participants have allocated labels to the target and arguments, we have a number of sets of labels. In order to reach a consensus on whether the target is accepted or rejected, we need to aggregate the sets of labels. This is the main contribution of this paper, investigating how to aggregate all the legitimate and subjective opinions of the participants, expressed as labellings, into a single collective labelling. Once we can aggregate all labellings, then we will be able to assess whether participants as a whole accept, reject, or fail to reach a clear decision about the topic (i.e., the target) under discussion.
In establishing suitable aggregation functions, we have to take into account that we are dealing with human providers of labels, and so cannot expect that the labels are assigned in a rational manner-contradictions or inconsistencies in assigning labellings may occur when expressing opinions. Despite allowing individual labellings Agents to be irrational, we still aim at designing aggregation functions that are able to combine these "imperfect" individual labellings into a "reasonable" agreed opinion. We intuitively characterise "reasonable" by the notion of coherent labelling (Ganzer-Ripoll et al. 2017a) and other desirable properties. The next section formally introduces the notion of coherence and subsequent sections study how it is possible to define some aggregation functions that will yield a single aggregated labelling that satisfies several desirable properties which include coherence.
Having introduced the concept of a target-oriented discussion framework, we introduce a simple example that will allow us to illustrate some of the ideas in the paper.
Example 1 (Neighbours' debate) Suppose Alan, Bart, and Cathy are neighbours and they aim to reach an agreement on the following norm (N ): "Neighbours should take fixed turns at 6 a.m. for cleaning leaves in the street". Thus, they pose three different arguments: a 1 = "The schedule is too rigid"; a 2 = "6 a.m. is too early"; and a 3 = "Fair task distribution". Notice that: arguments a 1 and a 2 are against N whereas a 3 is for it; and a 2 is in favour of a 1 , since someone that wakes up later would prefer to change the schedule. Making explicit both these arguments and their relations allows Alan, Bart, and Cathy to start sharing their opinions. Thus they can indicate whether they think each argument should be accepted or rejected, or whether they have no opinion about it: On the one hand, Alan (shown as Ag 1 in first row in Table 1) loves getting up late, and so he rejects norm N by assigning an out label to the target and accepts arguments a 1 and a 2 by labelling them as in. However, he concedes argument a 3 so that it also labels it as in. On the other hand, Bart (Ag 2 in second row in Table 1) is used to getting up early and is clearly in favour of norm N . Consequently, he accepts both norm N and argument a 3 and rejects arguments a 1 and a 2 which are against N . Finally, Cathy (Ag 3 in third row in Table 1) is keen on routines, and thus she accepts norm N and argument a 3 and rejects argument a 1 . Nevertheless, she likes to get up at 7 a.m., so she accepts a 2 .
Given this situation, the question that arises, and which this paper answers, is: should the neighbours agree to accept this street cleaning norm? or, in other words: how should they aggregate their individual opinions into a consensual one?

The Target-Oriented Discussion Framework
The debate between neighbours in the previous section exemplifies the key concepts of our discussion framework. In this framework, a norm N is the target of the debate between multiple agents. Agents can put forward arguments relating to the target or to other arguments and can express their opinions on those arguments together with the target of the debate. In this section, in addition to introducing formally the key concepts of our framework, we define opinions not presenting inconsistencies as coherent. In particular, we introduce the target-oriented discussion framework in Sect. 4.1, the agent's labelling representing the agent's opinions in Sect. 4.2, and our coherence notion in Sect. 4.3.

Formalisation of the Target-Oriented Discussion Framework
We aim to define a formal framework capturing both for and against relations between arguments. In this sense our work has some similarities with work in bipolar argumentation frameworks (Amgoud et al. 2008;Cayrol and Lagasquie-Schiex 2005b) and work on argument accrual (Besnard and Hunter 2001;Prakken 2005;Verheij 1995). The motivation behind including arguments for the target and for other arguments is given by novel works concerning humans participating in large-scale argumentation frameworks. (e.g. Klein 2012; Klein and Convertino 2015). These works allow human participants to express both for and against relationships between arguments. Within our framework, we aim to provide that expressiveness. 7 Our desire to capture human uses of argumentation also explains many of the differences between our system and those in the literature-this was explored in more detail in Sect. 2. In what follows we use the term "attack" express the existence of an "against" relationship between two arguments, as is common in the argumentation literature. We also use the term "defence" to express the existence of a "for" relationship between two arguments. We do not use the term "support" for this positive relation between arguments to stress the difference between our work and bipolar argumentation frameworks. 8 To simplify the formal analysis, we provide some restrictions on the way that a discussion unfolds. In other words, we insist that discussions follow a particular protocol with the following steps: 1. One agent puts forward the target of the discussion.
While any agent is allowed to start a discussion by putting forward a target, only one target is allowed per discussion. 2. Any agent is then allowed to put forward an argument in favour of, or against, the target and/or any arguments that have already been put forward. This process continues until no agent has any further arguments to put forward. 3. Agents express their opinions about whether the arguments that have been put forward hold, or whether those arguments do not hold by assigning in, out or undec labels to the arguments. Agents are not required to have an opinion about whether every argument holds or not-they are allowed to not express an opinion about any given argument-but Before defining the structure that results from the first two steps of this process, we define a more general structure, the discussion framework.
where A is a finite set of arguments, and →⊆ A × A and ⊆ A × A are disjoint attack and defence relationships (i.e., → ∩ = ∅). We represent that argument b ∈ A attacks argument a ∈ A as b → a, and that b defends a as b a.
A discussion framework can be also modelled as a graph whose nodes represent the arguments and whose edges represent either attack or defence relationships between arguments. Figure 1 shows the graphical representation of attack and defence relationships.
Next, we define the concept of descendant to capture the indirect relationship existing between two arguments through a sequence of attack and defence relationships.
be a discussion framework and a ∈ A one of its arguments. We say that an argument Given our notion of descendant, next we formalise a target-oriented discussion framework as having a target argument (e.g., a norm or proposal) as the main focus of the discussion.
Definition 3 A target-oriented discussion framework T O D F = A, →, , τ is a discussion framework satisfying the following properties: (i) for every argument a ∈ A, a is not a descendant of itself; and (ii) there is an argument τ ∈ A, called the target, such that for all a ∈ A \ {τ }, a is a descendant of τ .

Observation 1
The previous definitions allow us to identify some properties to further characterise a target-oriented discussion framework: Considering the previous definitions and observation, we can also infer the proposition below.
Proposition 1 Let T O D F = A, →, , τ be a target-oriented discussion framework and E = → ∪ . The graph associated to a TODF, G T O DF = A, E , is a directed acyclic graph, where A is the set of nodes and E the edge relationship.
Note the similarity between the graph structure of a T O DF and the way that (Proietti 2017) models debates using bipolar argumentation frameworks.
We can formalise the protocol for constructing a T O DF as follows: is constructed target-first if it is constructed according to the following rules: Note that in rule 2, b must attack or defend at least one argument in A and can attack or defend multiple arguments, but cannot attack and defend at the same time the very same argument.
The following proposition can be directly derived from Definitions 3 and 4.
Proposition 2 Any discussion framework DF constructed target-first will be a targetoriented discussion framework.
It is possible to construct a target-oriented discussion framework in a way that is not target-first, but in so far as we consider the construction of a discussion framework we will only consider target-first construction. Doing so not only ensures that the discussion framework is of a form that is easy to analyse-because it is acyclic-but it also fits with the way, sketched informally above, that existing discussion frameworks are used in practice.
Example 2 (A formalization of the neighbourhood discussion) Figure 2a depicts the neighbours' target-oriented discussion framework. The nodes in the graph represent the set of arguments A = {N , a 1 , a 2 , a 3 } in the example of Sect. 3, where N is the street cleaning norm, and a 1 , a 2 , a 3 are the rest of arguments. Thus, N , the norm under discussion, is taken to be the target τ in our T O DF. As to edges, they represent both the attack and defence relationships: a 1 → N , a 2 → N and a 2 a 1 , a 3 N respectively.

Argument Labellings
Once the notion of target-oriented discussion framework has been formalised, in this section we introduce the agent's opinions, what we call argument labellings. In terms of the four step protocol given above, this corresponds to step 3. Recall that step 3 involves agents expressing their opinions about the arguments in the discussion framework. Here we consider that each such opinion corresponds to a labelling in the sense of Baroni et al. (2011), Caminada (2006) and Caminada and Gabbay (2009).
That is, a labelling is an assertion about some or all of the arguments in the discussion framework being in one of three states: in, meaning that they are accepted by the agent expressing the opinion; out, meaning that they are not accepted by the agent expressing the opinion; or undec meaning that the agent doesn't have an opinion as to whether they are in or out. Besides expressing uncertainty the undec label represents the lack of an opinion. This feature is specially relevant in large-scale debates. As can be seen in Klein (2012), participants usually give their opinion about those arguments of their interest, but we cannot expect them to provide their opinions about all arguments posed within the context of a discussion.
Definition 5 (Argument labelling) Let T O DF = A, →, , τ be a target-oriented discussion framework. An argument labelling for a T O DF is a function L : A −→ {in, out, undec} that maps each argument of A to one of the following labels: in (accepted), out (rejected), or undec (undecidable).
We note as Ag = {ag 1 , . . . , ag n } the set of agents taking part in a T O DF, and as L i the labelling encoding the opinion of agent ag i ∈ Ag. We will put together the opinions of all the agents participating in an argumentation as follows.
Definition 6 (Labelling profile) Let L 1 , . . . , L n be argument labellings of the agents in Ag, where L i is the argument labelling of agent ag i . A labelling profile is a tuple L = (L 1 , . . . , L n ).
Example 3 (The opinions of the neighbours) Figure 2b graphically depicts Alan's, Bart's, and Cathy's labellings (noted as L 1 , L 2 , L 3 respectively), representing their opinion about the T O DF illustrated by Fig. 2a.

Coherent Argument Labellings
As pointed out in Awad et al. (2017), there are several ways in which a labelling over an argument structure can be evaluated. In Awad et al. (2017), the authors use the notion of complete labelling (Baroni et al. 2011). A complete labelling requires that an argument is labelled in iff all the arguments which attack it are labelled out; and an argument is labelled out iff at least one of the arguments that attack it is labelled in. The idea of a complete labelling starts with Dung (1995), and reflects the idea that a rational agent will label arguments consistently-thus an argument can only be accepted (in) if all of its attackers are not accepted (out) and so on. We believe that the restrictions imposed by complete labelling conditions are not suitable for human participation systems. Instead, we impose less conditions for a labelling to be classified as reasonable or coherent. Hence, given an argument a we contrast the opinions about the argument, named direct opinion, with the opinions about its immediate descendants, what we call indirect opinion, and look for ways in which these may be made somewhat consistent.
Consider the neighbours' example in Fig. 2b, given argument N , we take into consideration its assigned labels , i.e., its direct opinion L 1 (N ), L 2 (N ), and L 3 (N ); and the labels assigned to its descendants (a 1 , a 2 , and a 3 ), i.e., its indirect opinion. Similarly for argument a 1 , its direct opinion is formed by the labels assigned to a 1 and its indirect opinion is determined by the labels of its defending argument a 2 .
Then, the labelling over an argument will be coherent if its indirect opinion agrees with its direct opinion. In other words, when the majority of labels in its indirect opinion are in line with its direct label. In the following, the formalization of the notion the coherent labelling is proposed.
First, given an argument a we define its set of attacking arguments A(a) = {b ∈ A|b → a}; and its set of defending arguments D(a) = {c ∈ A|c a}. Hence, the labels attached to the arguments in A(a) ∪ D(a) form the indirect opinion of a.
Let L be a labelling and S a set of arguments, we denote the number of arguments accepted in S as in L (S) = |{b ∈ S |L(b) = in}| and the number of rejected arguments as out L (S) = |{b ∈ S |L(b) = out}|. Given this notation, we can consider the number of accepted defending arguments of a as in L (D(a)) and the number of rejected defending arguments as out L (D(a)). Similarly, the number of accepted and rejected attacking arguments respectively is represented by in L (A(a)) and out L (A(a)), respectively. We define the positive and negative support of the indirect opinion about an argument below.
Definition 7 (Positive support) Let a ∈ A be an argument and L a labelling on A. We define the positive (pro) support of a as: Pro L (a) = in L (D(a)) + out L (A(a)). If Pro L (a) = |A(a) ∪ D(a)| we say that a receives full positive support from L.
Definition 8 (Negative support) Let a ∈ A be an argument and L a labelling on A. We define the negative (con) support of a as: Con L (a) = in L (A(a)) + out L (D(a)).
If Con L (a) = |A(a) ∪ D(a)| we say that a receives full negative support from L.
Observe that the positive support of an argument merges the accepted defending arguments with the rejected attacking ones, whereas the negative support merges the accepted attacking arguments with the rejected defending ones. Table 2 illustrates the positive and negative support for the arguments involved in the neighbour's example.
We now introduce our notion of coherence by combining the positive and negative support of an argument. We consider that a labelling is coherent if for each argument the next conditions are fulfilled: (1) if an argument is accepted, that is it is labelled in, then its positive support has to be higher than its negative support and (2) if an argument is rejected, is labelled out, then its negative support has to be higher than positive support To finish, we define a more general notion of coherence, a stronger one, taking into account what is the difference between the positive and negative support. Example 4 Now we apply this definition to the example in Fig. 2b. Table 2 shows that while labellings L 1 and L 2 are coherent, L 3 is not. L 3 is not coherent because the labelling is not coherent for argument a 1 : while the direct opinion on the argument indicates rejection (L 3 (a 1 ) = out), its indirect opinion indicates acceptance (its positive support (1) is greater than its negative support (0)). Just L 1 , L 2 belong to the subclass of its coherent argument labellings Coh(T O DF). Moreover, L 1 and L 2 are 0-coherent.
We have now provided the machinery for agents to express their opinions about the arguments in a target-oriented argumentation framework, and so have all we need to support step 3 of the protocol in Sect. 4.1.

The Aggregation Problem
As stated above, our goal is to help agents reach a collective decision on the acceptance or rejection of a target. This corresponds to step 4 of the protocol in Sect. 4.1. In Sect. 5.1 we cast our goal as a judgement aggregation (List and Pettit 2002) problem that is solved by having a set of agents collectively decide how to label a target-oriented argumentation framework. We propose to solve such problem using an aggregation function that provides a label for the target and the arguments. Although labellings can be aggregated in different ways, here we follow (Awad et al. 2017) in requiring the outcome of an aggregation must be fair. In particular, Sect. 5.2 defines a set of properties to analyse different aggregation functions.

Collective Labelling
First, we define our notion of discussion problem by putting together a TODF and the individual labellings of the agents involved in a discussion.
Definition 11 (Labelling discussion problem) A labelling discussion problem LDP is a pair Ag, T O DF , where Ag is a finite, non-empty set of agents, and T O DF is a target-oriented discussion framework.
In our example, the labelling discussion problem is LDP = {ag 1 , ag 2 , ag 3 }, T O DF . Our goal is to aggregate the individuals' labellings in a LDP to produce a labelling that represents the collective opinion in the discussion. Again, in our example, that would amount to aggregating L 1 , L 2 , L 3 into a single labelling. In short, an aggregation function F outputs a single labelling from the opinions of the agents contained in a labelling profile. The resulting single labelling encodes the collective decision over the target and the arguments.
Definition 13 (Decision over a target) Given an aggregation function F for a labelling discussion problem Ag, T O DF and a labelling profile L, the label F(L)(τ ) stands for the decision over the target of the T O DF.

Social Choice Properties
Social choice theory provides a collection of formal properties that make it possible to characterise aggregation methods in terms of outcome fairness (Dietrich 2007). Based on Awad et al. (2017), here we formally adapt some of these properties to characterise the desirable properties of an aggregation function in terms of both the arguments in a discussion framework and the collective decision output by the function. Besides these adapted properties, we define some novel properties to characterise aggregation functions with regard to: our coherence notion; and the consideration of dependencies between arguments, recall that our work is the first to relax the limiting assumption of argument independence in the context of collective decisions.
The first two properties characterise aggregation functions in terms of the labellings that they can take as input. In particular, we first adapt from Awad et al. (2017) the notion of exhaustive domain to characterise aggregation functions defined for any labelling profile; and, then, we modify this property to consider if a function is at least defined for coherent labelling profiles. Moreover, we also define collective coherence as a property characterising aggregation functions that produce coherent outcomes.

Collective coherence (CC). An aggregation function F satisfies CC if for all L ∈ D F(L) ∈ Coh(T O DF).
We consider CC as the most important property to satisfy by an aggregation function. Notice that an aggregation function fails at satisfying collective coherence when it is not able to produce a coherent labelling. This is the case when there is a contradiction between the collective label (direct opinion) and the collective indirect opinion for some argument. Such contradiction may pose a threat to the acceptability of collective decisions (Thagard 2002). Notice that collective coherence is the counterpart of the collective rationality property defined in Awad et al. (2017). There, Awad et al. require that the outcome of aggregating labellings is a complete labelling. As argued in Sect. 4.3, our notion of coherence can be viewed as a relaxation of the notion of completeness. Hence, collective coherence can be regarded as our relaxation of collective rationality.
Within a discussion, the opinions of all the agents involved must be considered equally significant. Anonymity is a social choice property that captures such requirement. The non-dictatorship property requires that no agent overrules the opinions of rest of the agents. Notice that since non-dictatorship follows directly from the satisfaction of anonymity, the former is a weaker version of the latter. (Awad et al. 2017). An aggregation function F satisfies ND if no agent ag i ∈ Ag satisfies that F(L) = L i for every labelling profile L ∈ D.

Non-Dictatorship (ND)
Another important property in the social choice literature is unanimity, which characterises the behaviour of aggregation functions when there is agreement among the agents' opinions. Here, we define two unanimity properties that take into account the relationships between the arguments in the target-oriented discussion framework. In particular, we adapt the notion of unanimity to express a different desirable property: if all agents share the very same direct opinion on an argument, we demand that the collective opinion is in line with such agreed opinion. We name this property direct unanimity to reflect that only direct opinions are taken into account. Then, we expand the notion of unanimity to consider the dependencies between the arguments and study the cases when there is unanimity in the indirect opinions. In particular, endorsed unanimity is the counterpart of direct unanimity for indirect opinions: if there is an unanimous indirect opinion for an argument, the collective opinion for the argument must be in line with it.
Direct Unanimity (DU). Let L = (L 1 , . . . , L n ) be a labelling profile, where L ∈ D. An aggregation function F satisfies DU if, for any a ∈ A such that L i (a) = l for all L i ∈ L, where l ∈ {in, out, undec}, then F(L)(a) = l holds.
Endorsed Unanimity (EU). Let L = (L 1 , . . . , L n ) be a labelling profile such that L ∈ D. An aggregation function F satisfies EU if: (i) For any a ∈ A such that a counts on full positive support for all L i ∈ L, then F(L)(a) = in; (ii) For any a ∈ A such that a counts on full negative support for all L i ∈ L, then F(L)(a) = out.
In addition to unanimity, we also consider a complementary property, namely supportiveness. This requires that an aggregation function does not label an argument with a label that has not been employed by any agent. (Awad et al. 2017). An aggregation function F satisfies S if for every argument a ∈ A and for all labelling profile L = (L 1 , . . . , L n ), L ∈ D, we can find some agent ag i ∈ Ag for which F(L)(a) = L i (a) holds.

Supportiveness (S)
Monotonicity is a property aimed at capturing how the result of an aggregation function changes as opinions, expressed as labellings on arguments, change. In particular, if some of the direct opinions of an argument change to become the same as its collective labelling, then this collective labelling should remain the same. Here we adapt monotonicity and in-out-monotonicity properties from Awad et al. (2017). Unlike monotonicity, in-out-monotonicity (we prefer the name binary monotonicity) only considers the in and out labels. 9 We expand the notion of monotonicity with two novel properties that, unlike the notion of monotonicity presented in Awad et al. (2017), consider the opinions of an argument's descendants. The first of these novel properties, which we call familiar monotonicity, 10 determines that when the direct support for the collective labelling of an argument increases, the collective labelling must not change provided that the opinions on the descendants of the argument do not change either. The need for the latter condition stems from the fact that an argument's collective labelling might change after the opinions on its descendants are changed. The second property that we propose is the binary version of familiar monotonicity.
The previous notions of monotonicity are related:

Proposition 3 If an aggregation function is monotonic (respectively binary monotonic), then it satisfies familiar monotonicity (respectively binary familiar monotonicity).
Proof The proof is straightforward because the satisfaction of the hypothesis required by familiar monotonicity (resp. binary familiar monotonicity) implies the satisfaction of the hypothesis required by monotonicity (resp. binary monotonicity).
Finally, the notion of independence (Awad et al. 2017) states that the aggregated label for an argument must depend only on the labels that different agents have for that argument. That is, the aggregated label does not depend on the labels for other arguments. Independence is not a desirable property; but we include it here for completeness.
Independence ( In this section we have listed a set of properties to characterise aggregation functions. However, it is important to note that not all of them are equal. For a multi-party argumentation-based discussion we believe that Collective Coherence is the most important property. If an aggregation function is collectively coherent, then it extracts a coherent labelling, regardless of the coherency of the individual opinions being aggregated. Along with collective coherence, we also consider that aggregation functions should satisfy the two domain related properties-Exhaustive Domain and Coherent Domain (where we would prefer exhaustive domain to allow wider applicability)and the usual social choice properties of Anonymity and (if that is not possible) Non-Dictatorship. We also consider that aggregation functions should be monotonic, and, given that we want to capture dependencies, Familiar Montonicity (binary or otherwise) is then desirable. Unanimity is also important, but we consider that this is less important, since we can imagine cases in which unanimity is not satisfied to achieve more important properties such as coherence. 11 We do not think that aggregation functions should satisfy the remaining properties, namely Monotonicity (binary or otherwise), Supportiveness and Independence. 12 We only include them in order to provide a complete characterisation of aggregation functions.

Designing Aggregation Functions to Enact Collective Decision Making
The purpose of this section is to design aggregation functions that calculate the collective labelling for a labelling discussion problem and, thus, the decision over a target. With this aim, notice that in Sect. 1 we observed that independence cannot be considered as a reasonable assumption, and hence our aggregation functions should aim at exploiting dependencies between arguments. At this point, the question is how to exploit dependencies, which fundamentally amounts to deciding how to exploit indirect opinions when computing the aggregated labelling for a given argument. This motivates the design in this section of a family of aggregation functions that exploit 11 For example, when everyone has voted that a contradictory pair of arguments-e.g., that taxes should be cut to improve the economy and that the budget should be balanced in order to improve the economyshould both be in, we would prefer a function that gives up unanimity and identifies that one of these arguments must be out to ensure collective coherence to a function that ensures unanimity and insists that they must both be in. We realise that there are real life groups, such as the current Republican caucus in the US Congress, which would prefer unanimity to collective coherence in such cases. 12 Note these properties are related to the argument independence assumption that we are relaxing here.
indirect opinions in different ways, namely: (i) by giving priority to direct opinions over indirect opinions; (ii) by giving priority to indirect opinions over direct opinions; and (iii) by combining both direct opinions and indirect opinions considering that they are valuable to the same degree. Besides introducing such functions in Sects. 6.1.2, 6.1.3, and 6.1.4 below, we also investigate the social choice properties that each one satisfies. Thereafter, in Sect. 6.2 we compare the satisfied social choice properties per aggregation function to elucidate the aggregation function that best exploits indirect opinions. Before that, and for the sake of completeness, this section starts, in 6.1.1, by introducing an aggregation function that completely disregards indirect opinions: the so-called majority rule. This will allow us to analyse, as part of our discussion in Sect. 6.2, the benefits and drawbacks, in social choice terms, of exploiting indirect opinions.
Through the whole section, we will employ the following notation to represent the direct positive and negative support of an argument. Let L = (L 1 , . . . , L n ) be a labelling profile and a an argument, in L (a) = |{ag i ∈ Ag |L i (a) = in}| denotes the direct positive support of a, whereas out L (a) = |{ag i ∈ Ag |L i (a) = out}| denotes its direct negative support.

Disregarding Dependencies: A Majority Rule
The majority function simply compares the acceptances and rejections received by an argument. The argument will be accepted or rejected depending on whether acceptances or rejection are majority. It will be labelled as undecided if there is a tie. Formally, Definition 14 (Majority function) Given a labelling profile L, the majority function for any argument a is defined as: Example 5 (Majority rule in the neighbourhood discussion) Following the neighbours' example, we use the majority function to compute the collective labels of each arguments. See the Fig. 3 that graphically represents the collective labelling obtained. For arguments a 2 , a 3 and N there are more in's than out's opinions, therefore the collective labels using M for such arguments is in. For argument a 1 , is the reverse, there are more out's than in's, thus, its collective label is out.

Exploiting Dependencies: Prioritising Direct Opinions
The next function to study, the so-called opinion first function (OF), is a variation of the majority function that exploits dependencies, but prioritising direct opinions  Figure 4 shows the collective label produced by O F for each argument in the neighbours' example. Since there are no ties for any argument, O F behaves like M, and so its collective labelling accepts a 2 , a 3 and N , and rejects a 1 .

Exploiting Dependencies: Prioritising Indirect Opinions
As a counterpart of O F, next we define and study the so-called Support First function (SF), which prioritises indirect opinions over direct opinions. S F considers first indirect opinions to obtain an aggregated opinion on an argument. If using indirect opinions leads to a tie, then S F uses direct opinions to resolve the tie, if possible. Formally, Definition 16 (Support First Function) Given a labelling profile L, the support first function for any argument a is calculated as:  Figure 5 shows the collective label produced by S F for each argument in the neighbours' example. Recall that S F considers first indirect opinions. Since arguments a 2 , a 3 have no descendants, their collective labellings stem from the majority in the direct opinion, and hence, S F(L)(a 2 ) = S F(L)(a 3 ) = in. As to argument a 1 , S F first considers the collective labelling of a 2 , that is in, and thus S F(L)(a 1 ) = in. Finally, target N is attacked by arguments a 1 , a 2 , both with collective label in, and defended by argument a 3 with label in. Therefore, the indirect collective support of N is against N , and hence S F rejects it, namely S F(L)(N ) = out.

Exploiting Dependencies: Combining Direct and Indirect Opinions
Finally, after studying functions giving priority to either direct opinions, O F, or indirect opinions, S F, in what follows we design an intermediate function balancing both. With this aim, we introduce the balanced function B F, which equally combines direct and indirect support. The following definition might seem a bit complex, but the underlying rationale is simple: for each argument, the balanced function computes both its direct and indirect support to choose the label that best represents both.
Example 8 (Neighbourhood discussion) Figure 6 shows the aggregated opinion and the decision over the target for our neighbourhood example obtained by the balanced aggregation function. As shown in the picture, neighbours collectively accept arguments a 2 and a 3 , whereas argument a 1 is undecided. Finally, the decision over the target is to accept it (i.e., B F(L)(N ) = in) and the norm is accepted.

Comparing Aggregation Functions
This section compares the results obtained by the aggregation functions proposed in the previous sections. The results themselves are contained in "Appendix A" and we refer the reader who wants to understand the formal propositions and proofs of the properties fulfilled by each function to that "Appendix". Here, we just use those results to compare the performance of the functions. We make this comparison with the aid of Table 3, which shows the social choice properties fulfilled by each aggregation function. Table 3 splits social choice properties into two groups: those identified as desirable in Sect. 5.2 when exploiting dependencies between arguments, and those that are not so relevant for our purposes but are typically referred to in the social choice literature. Recall that, as stated in the previous sections, the most important property is collective coherence because it ensures the rationality of the outcome of an aggregation function. Table 3 shows us the relation between this property and the "degree" of indirect opinion involved in the decision making represented by each aggregation function. From left to right in Table 3: M disregards indirect opinions; O F prioritises direct opinions over indirect opinions; B F equally considers direct and indirect opinions; and finally, S F prioritises indirect opinions.
By analysing Table 3 we draw several interesting observations regarding: (i) the positive and negative effects of exploiting dependencies; (ii) the aggregation function that offers the best compromise between exploiting direct and indirect opinions; and (iii) the positive and negative effects of introducing uncertainty by means of the undec label. Second, despite obtaining major benefits, particularly in terms of satisfaction of collective coherence, we pay a price for exploiting dependencies, namely: -The exploitation of indirect opinions impacts the satisfaction of unanimity and monotonicity properties. Notice that as we move from left to right in Table 3, the unanimity and monotonicity properties become less satisfied, clearly relating the satisfaction of the properties with the level of indirect opinion involved: the higher the importance of indirect opinions in an aggregation function, the less the number of satisfied unanimity and monotonicity properties. -Exploiting dependencies between arguments impedes independence. As expected, even a little involvement of indirect opinions in the decision making prevents the fulfilment of this property, and, therefore, the fulfilment of other social choice properties (not considered in this paper) stronger than independence. However, note that we do not regard this observation as a negative result. Recall from our discussion in Sect. 2.3 that (Awad et al. 2017) consider the necessity of independence questionable (because of the existing dependencies between arguments), while the literature considers independence as too strong and not very plausible.
At this point, given the above-mentioned pros and cons regarding the exploitation of dependencies, we are ready to identify what we believe is the best-in-class-aggregation operator: -BF provides the best trade-off between exploiting direct and indirect opinions. On the one hand, O F does not satisfy collective coherence, but it satisfies both types of unanimity and the weaker versions of monotonicity. On the other hand, while S F satisfies collective coherence, it fails at satisfying unanimity and monotonicity properties. B F sits between O F and S F.
Last but not least, we turn our attention to the benefits and drawbacks of introducing the undec label to cope with uncertainty: -The introduction of uncertainty favours the general treatment of any kind of labelling profile. Implicitly, in our approach we use the undec label to obtain an outcome even in those cases where there is no clear decision over an argument. The introduction of the undec label helps undo ties (when the number of acceptances equals the number of rejections) that would occur in the absence of this label. Not allowing the undec label would restrict the domain of the aggregation functions and hamper decision making despite the existence of valid opinions. Note that this is not the case for all the aggregation functions that we have introduced, since they all fulfil the exhaustive and coherent domain properties. -The introduction of uncertainty negatively affects monotonicity properties. The use of the undec label may cause the lack of a "positive" or "negative" decision regarding the acceptance of an argument. This fact impacts directly on the satisfaction of the monotonicity properties, and hence the need for weaker versions such as binary monotonicity and binary familiar monotonicity.
Besides the general observations compiled above, Table 3 is also valuable to help us individually analyse each of the aggregation functions introduced in this paper: -We have shown that M does not satisfy our most important property, collective coherence. Therefore, M does not ensure the coherence of the labelling obtained as a collective decision, and therefore it might contain irrational sets of argument labellings. Despite this fact, the majority function satisfies many of the other desired social choice properties without any restrictions, with the exception of the endorsed unanimity property, which is restricted to 0-coherent profiles. We also observe that while M satisfies restricted versions of monotonicity properties, it does not satisfy their non-restricted versions. Finally, the non-exploitation of dependencies guarantees the satisfaction of the independence property, but due to the undec label resulting from a tie, it prevents the satisfaction of supportiveness. -At first sight, the O F function satisfies several desirable social choice properties without restrictions, except for endorsed unanimity, which requires coherent labelling profiles in order to hold. Nonetheless, O F still fails, just like M, to satisfy collective coherence, and hence we cannot ensure the rationality of the collective decision. Finally, O F does not satisfy the non-binary monotonicity properties, and, as a result of exploiting indirect opinions, it loses the independence property.
To summarise, the way O F exploits indirect opinions is not enough as observed above in our general analysis. -S F increases the relevance of indirect opinions when computing a collective labelling. On the one hand, this entails the satisfaction of collective coherence. On the other hand, this negatively impacts the satisfaction of monotonicity, since S F loses binary monotonicity with respect to O F. Furthermore, S F also is further from satisfying endorsed unanimity than O F, since S F does not satisfy endorsed unanimity even when we impose some kind of coherence on agents' individual labellings. Finally, likewise M and O F, S F also satisfies: exhaustive and coherence domain, anonymity, non-dictatorship, and binary familiar monotonicity. -B F provides a trade-off between O F and S F. First, B F satisfies most of the desirable properties identified in Sect. 5.2, including collective coherence. However, note that BF only satisfies endorsed unanimity in case of 0-coherent labellings. However, notice that S F did not satisfy any of the unanimity properties. Second, B F does not satisfy properties such as direct unanimity and supportiveness, but recall that the first one was considered the least desirable property and that the second one was not even considered as desirable.

Computational Analysis
The purpose of this section is twofold. First, given a labelling discussion problem, in Sect. 7.1, we detail an algorithm for computing a collective decision on its target. Thereafter, we empirically analyse the use of that algorithm to solve real-world collective decision problems.

Computing the Decision Over a Target
Consider a discussion framework, T O DF = A, →, , τ , with a target τ for which we aim at computing a collective label. Thus, we required a profile L reflecting the opinions of the agents involved in the discussion and a function to aggregate the opinions in the profile (be it either S F, O F, or B F). Now, observe that according to Proposition 1, the graph associated to the TODF is a DAG. Therefore, the computation of the collective labels for the arguments in the discussion framework can be performed while traversing its associated graph, henceforth referred to as G T O DF . This is where we can resort to topological sorting (Kahn 1962) to perform graph traversal. Thus, we propose to embed the computation of the collective labels for the arguments and the target of a discussion framework into a topological sorting algorithm. From this follows that the computation of the collective label for the target is linear in the number of nodes (arguments) plus edges (attack and defence relationships) in the associated graph of the discussion framework, asymptotically, namely O(|A| + | →| + | |).
Function computeCollectiveDecision in Algorithm 1 calculates the collective decision for the target τ of a discussion framework T O DF from a profile L and aggregation function F. In Ganzer-Ripoll et al. (2017b) we provide a public implementation of Algorithm 1 together with all the functions introduced in Sect. 6.
Algorithm 1 Compute collective decision 1: function ComputeCollectiveDecision(G T O DF , τ,F,L) 2: Pending Arguments ← arguments with no descendants (neither attacked nor defended) 3: while Pending Arguments is not empty do 4: b ← remove argument from Pending Arguments 5: if there are no incoming edges for argument c then 9: add argument c to Pending Arguments 10: return F(L)(τ ) Collective label for target τ

Empirical Analysis
This subsection empirically analyses the time required by our implementation of Algorithm 1 to compute collective decisions. Our purpose is to investigate whether our approach to collective decision making can be used in practice.
In order to do this, we took as a reference Parlement et Citoyens (2017), which, as mentioned above, is a well established e-government participation site that enables French citizens to participate in the development of laws by making, debating and voting for law proposals. Table 4 provides some details of the 12 consultations that had been completed as of November 2017 (the time of writing). The first column gives the topic of the proposal, and illustrates that these policy consultations are conducted over a wide range of different topics ranging from sustainability to migrants and even modification of the constitution. Each consultation is structured in chapters grouping a number of articles. The number of articles for a proposal is given in the second column. For each individual article, citizens can provide pro and con arguments. The total number of arguments for each proposal is given in the third column. The fourth ("arg/art") column shows the ratio of arguments per article for each of these proposals. Thus, there are 6.2 arguments per article on average. The fifth column shows the number of people who participated per proposal. The sixth column, labelled "arg%" in Table 4, provides the number of arguments as a percentage of the number of participants. The average is 2.84%, meaning that the average participant makes around 3 arguments on each consultation they participate in. 13 Moreover, participants can vote for both the arguments and the article proposals. The last column in Table 4 shows the aggregated number of votes. Finally, it is also worth mentioning that other participation sites, such as Quoners (2017) or Decide Madrid (2017), are characterised by a similar proportion of arguments to proposals. Using this real world case scenario as a reference, we artificially generated discussion frameworks where arguments are the nodes of a directed acyclic graph and the edges represent the relationship between the arguments. Given a number of arguments, the graph representing a discussion is a directed acyclic random graph with a probability of 0.5 of creating an edge between any two nodes. Given an edge between two nodes, there is a probability of 0.5 that the edge represents an attack between the arguments, and a probability of 0.5 that it represents support between the arguments. The directions of the relationships between arguments are also randomly determined during the generation of the directed acyclic graph.
Given a discussion framework, we then generated labellings to compose a labelling profile. Each labelling within a profile is built by assigning a random label to each argument in the directed acyclic graph representing the discussion framework. Hence, randomly generated labellings are not guaranteed to be coherent. Despite that, recall that two of our aggregation functions do ensure collective coherence of the resulting decision.
All the computations of collective decisions four our artificially generated discussion frameworks were performed on an Ubuntu 16.04 box with an Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz and 4 cores. Furthermore, our experiments set F to be B F, since we consider it to be the best-in-class aggregation function. Computation times with O F, and S F are expected to be similar because they both aggregate direct and indirect opinions just as B F does. To establish how the time required to compute a collective decision varies as the number of arguments and participants grows, we looked at discussions with 100-500 arguments, and 10 3 -10 5 labellings (representing participants in a discussion). Comparing with Table 4, we can see that these numbers go far beyond the number of arguments and participants involved in debates in actual-world platforms. For each pair of number of arguments and number of participants we generated 100 artificial debates as described above, and computed the collective decision for each debate. Figure 7 shows the average time (in seconds) required to compute a collective decision as the number of arguments and participants varies. From the figure we learn that we can employ our approach to compute collective decisions in only a few seconds for debates involving as many as 500 arguments and 10 4 participants. Computing collective decisions for 10 5 becomes more demanding, but still all the scenarios can be solved in a a few tens of seconds.
Following the above analysis we investigated the sensitivity of the time required to compute a collective decision to the density of connections between arguments in a discussion framework. For this second study we set the number of participants to 10 3 -in the middle of the range we studied before-and varied the number of arguments, between 100 and 500, and the probability for edge creation in the random graph representing the discussion framework, was set to a value from {0.25, 0.5, 0.75}. This allowed us to generate artificial debates with low, medium, and high density of connections between arguments, corresponding to probabilities 0.25, 0.5, and 0.75 respectively. Figure 8 clearly shows that the time to compute a collective decision is affected by density of connections between arguments as the number of arguments grows: the larger the number of arguments, the larger the impact of density of connections between arguments on computational time. Moreover, the larger the density, the more costly to compute a collective decision. At this point, notice that even the lowest density of connections considered in Fig. 8 goes far beyond the densities that we find in actual-world debates. For instance, when considering a low density scenario (by setting the probability for edge creation to 0.25), each argument receives ∼ 12% attacks and defences (on average) of the total number of arguments. That means that in a debate with 500 arguments, each argument would be related (on average) to other ∼ 60 arguments. This already configures a rather dense debate compared to actual-world debates, where humans add new arguments by relating them to a few arguments. In fact, as we have previously mentioned, many participation sites just allow to pose arguments in favour or against a proposal (i.e., specifying a single relationship for each new argument).
To summarise, given that we have evaluated configurations of artificial debates whose scale goes far beyond those of actual-world participation systems, we consider that our approach has the potential to be used in practice, even the largest existing scenarios.
decision, we must look to produce an outcome that is coherent, namely one that is free of contradictions. Doing that has been the main goal in this paper.
In particular, our approach to solving the above-mentioned collective decision problem makes several contributions. First, we have proposed a mechanism to support debates. More precisely, we have introduced a novel multi-agent argumentation framework aimed to articulate a discussion on a given targeted topic or proposal. Thanks to our framework, participants in a debate can express arguments for and against this proposal, indicate the relationships between arguments, and express their opinions about arguments. Furthermore, our framework makes it possible to determine whether a participant's opinion is reasonable (coherent) or not.
Second, we enrich our multi-agent argumentation framework with a novel set of aggregation functions that operationalise of the combination individual opinions. This operationalisation results in a consensual decision over the topic under discussion. Since, as argued in Sect. 1, independence cannot be considered as a reasonable assumption when dealing with arguments, we have designed a family of aggregation functions capable of exploiting dependencies between arguments in different ways. We proved that two of those functions guarantee the coherent collective rationality of the outcome. And this is the case for any sort of labelling profile, namely even those in which participants' opinions are not individually coherent. We also studied several socialchoice properties of our aggregation functions, inspired by the work in Awad et al. (2017), where classical properties from social choice are also checked in argumentation settings. Our study produced insights into the design of an aggregation function and the price paid to ensure coherence and handle uncertainty. We showed that either disregarding indirect opinions or prioritising direct opinions over indirect opinions is not enough to achieve collective coherence. However, the necessary exploitation of indirect opinions to obtain collective coherence comes at a price: the higher the importance of indirect opinions in an aggregation function, the fewer the number of unanimity and monotonicity properties that are satisfied. In the end, we observed that the balanced aggregation function, which treats direct and indirect opinions equally, is the one that provided the best trade-off between exploiting direct and indirect opinions. As to uncertainty management, although the introduction of uncertainty favours the general treatment of any kind of labelling profile, it negatively affects monotonicity properties.
Overall, the contributions in this paper break new ground in bringing together the fields of argumentation and computational social choice. We believe that the intersection of these two fields is a sweet spot in which to base the investigation of principled debate-based systems.
In future work, we will particularly focus on three directions. First, we will look at generalising the argumentation framework to allow it to capture more natural debates than the rather simple discussions that are currently captured. Here we see the framework from McBurney and Parsons (2002) as a suitable starting place. Second, we plan to enrich the expressiveness of our target-oriented discussion framework so that opinions about arguments can be expressed not as"accepted", "rejected", and "don't know", as is currently the case, but instead as a number, indicating the degree to which the argument is accepted. This will obviously require the design of another family of aggregation functions. Third, we plan to investigate the type of interfaces required by humans to participate in target-oriented discussions without being overwhelmed by their complexity. In this line, the works by Gabbriellini and Torroni (2015), Klein (2012), Sklar et al. (2016) appear as promising pointers to the direction we should take.

A Formal Proofs and Results
In this section we detail the results of the aggregation functions, defined in Sect. 6, proving all the properties summarized in Table 3.
First, in Sect. A.1, taking advantage of the similarities between the proposed aggregation functions, we prove some generic results applicable to all of them. Second, in the rest of sections of this appendix, we prove the remaining properties per aggregation function.

A.1 General Results
The next proposition groups the properties equally fulfilled by all aggregation functions.

Proposition 4 The Majority function M, the Opinion First function O F, the Support First function S F and the Balanced function B F: Satisfy the following properties:
(i) Exhaustive and Coherent Domain; (ii) Anonymity and, hence, Non-Dictatorship.

Proof of Proposition 4 (i) Exhaustive and Coherent Domain.
By the definitions in Sect. 6, the functions defined compute an outcome using the cardinal number of different kinds of finite sets whose elements are not restricted by properties of the labellings. Hence, whatever the labelling profiles are (coherent or not), these sets can be computed, and, therefore, the functions can compute an outcome.
Hence, the direct opinion of an argument is not affected by any permutation of the agents. Regarding indirect opinions, note that for any argument its indirect opinion depends on the direct and indirect opinion of its descendants. For any argument a without descendants, Pro F(L) (a) = Pro F(L ) (a) = 0 and Con F(L) (a) = Con F(L ) (a) = 0. For any argument b only having as descendants arguments without descendants, its indirect opinion will be determined by the direct opinion on their descendants, which will not change by the permutation and, hence, Applying this reasoning recursively, we conclude that the indirect opinion of any argument is not affected by the permutation of the agents.
As  Fig. 9c and d, respectively. Notice that changing the direct opinion of agent 2 on a from out to undec changes the aggregated labelling, F(L )(a) = in = undec = F(L)(a), disproving familiar monotonicity, and therefore monotonicity. (iv) Supportiveness.
The previous example is also a counter-example for supportiveness by only using L and its aggregation outcome F(L) (see Fig. 9a, c). Note that F(L)(a) = undec though no agent's opinion is undec on a.
The following lemma, relating coherence to positive and negative support, will help us in proving results regarding the Endorsed Unanimity property for some functions. If L is coherent and L(a) = out, then, by coherence, Pro L (a) ≤ Con L (a) but Pro L (a) = m and Con L (a) = 0, and hence we obtain a contradiction, therefore L(a) = out. Moreover, suppose that L is 0-coherent, then it cannot be that L(a) = undec because that would mean, by definition of 0-coherence, that Pro L (a) = Con L (a), which is not the case. The proof goes analogously for the case with full negative support.
In the following we analyse the satisfaction of the remaining properties (i.e., unanimity, endorsed unanimity, binary monotonicity, binary familiar monotonicity, independence and collective coherence) for each aggregation function.

A.2 Majority Function
Proposition 5 The Majority function M: Satisfies the following properties: Does not satisfy the following properties:

Proof of Proposition 5 (i) Unanimity.
M trivially satisfies unanimity, since unanimity over an argument is the greatest majority that can be achieved.  Fig. 10b) which is not a coherent labelling. (vi) Endorsed Unanimity.
The previous example provides as well a counter-example for endorsed unanimity. Note that the argument a receives full negative support though its final aggregated label is in. Does not satisfy the following properties:

A.3 Opinion First Function
(iv) Collective Coherence; (v) Endorsed Unanimity; (vi) Independence. The counter-example of Proposition 5(vi) (see Fig. 10a) also serves as a counterexample for O F; i.e., computing the opinion first function over that profile we obtain that O F(L)(b) = in though the argument a receives full negative support. (vi) Independence.

Proof of
We prove this by counter-example (see Fig. 11

A.4 Support First Function
Proposition 7 The Support first function S F: Satisfies the following properties: (i) Collective Coherence; (ii) Binary Familiar Monotonicity.
Proof ( Analogous reasoning can be applied when l = out and S F(L)(a) = out, and hence the binary familiar monotonicity for S F holds. (iii) Binary Monotonicity.
We will employ the counter-example depicted in Fig. 12 Fig. 12c, d). This counter-example shows that due to the change of the direct opinion about argument a to out, the aggregated labelling for argument a changes to in (i.e., S F(L )(a) = in). (iv) Direct unanimity.
Using the counter-example of Fig. 12, both labelling profiles (i.e., L and L ) are unanimous on the labels on a, but the aggregated label for a does not agree with them. (v) Endorsed Unanimity.
The proof requires an example of an argument with two levels of descendants (see Fig. 13). Consider a T O DF = {a, b, c}, {(b, a), (c, b)}, ∅, {a} with one labelling profile L formed by the following labelling: L(a) = L(b) = L(c) = in (see Fig. 13(a)). The aggregated labelling obtained by S F is S F(L)(a) = in = S F(L)(c) and S F(L)(b) = out (see Fig. 13b). This example shows that although a has full negative support (b is labelled in in L), the aggregated label for a is accepted (labelled with in), which contradicts the support. (vi) Independence. The counter-example for proposition 6(vi) (see Fig. 11) also applies to S F. In  Fig. 14d). As can be seen, B F(L)(a) = in and by changing a's label to in on L the aggregated labelling of a changes to undec due to the changes in the descendant's labels. (v) Unanimity. Fig. 15 represents a counter-example. Let a T O DF contain an argument τ = a, which is defended by five other arguments {a 1 , a 2 , a 3 , a 4 , a 5 }. The T O DF involves the argument labellings of three agents, L 1 , L 2 , and L 3 (see Fig. 15a): (1) L 1 (a) = L 1 (a 1 ) = L 1 (a 2 ) = L 1 (a 3 ) = in and L 1 (a 4 ) = L 1 (a 5 ) = out, (2) L 2 (a) = L 2 (a 1 ) = L 2 (a 2 ) = L 2 (a 4 ) = in and L 2 (a 3 ) = L 2 (a 5 ) = out, and (3) L 3 (a) = L 3 (a 1 ) = L 3 (a 2 ) = L 3 (a 5 ) = in and L 3 (a 3 ) = L 3 (a 4 ) = out. Notice that the three agents agree on accepting the target, and hence there is unanimous opinion on a. Figure 15b depicts the resulting labelling when computing the B F function the labelling profile L. Since arguments a 1 and a 2 are collectively accepted (B F(L)(a 1 ) = B F(L)(a 2 ) = in) and arguments a 3 , a 4 , and As we can see, a has full negative support but the collective label is undec instead of out. (v) Independence.
The counter-example of proposition 6(vi) (see Fig. 11) applies to this function as well. In this case, L(a) = L (a) = undec, B F(L)(a) = out and B F(L )(a) = in