1 Introduction

Moral arguments, and their ability to convince in particular, are not just about their logical structure, but also about the moral values they promote and how these values align with those of the audience (Perelman 1980). Different audiences may find different arguments, or different aspects of the same argument, persuasive, as it depends on moral values and beliefs they hold (Bench-Capon and Atkinson 2009; Bench-Capon 2003). In our approach, a moral argument is a discourse unit used to convince an audience by appealing to morals. This involves presenting a claim for a specific moral viewpoint on a particular moral issue, accompanied by premises that adjust with moral considerations. The comprehension and application of moral reasoning necessitates the identification of moral principles that serve as the common intuitions (Hare 1981).

The Moral Foundations Theory, MFT (Haidt and Joseph 2004), outlines five moral foundations (intuitions) characterised by a pair of opposing values, i.e. virtues and vices (see Table 1): Care/Harm, Fairness/Cheating, Loyalty/Betrayal, Authority/Subversion, Sanctity/Degradation. This provides a framework for identifying the moral principles that underpin an individual’s or group’s initial moral judgements.

These foundations are instrumental in explaining moral reasoning of individuals and groups. In general, they can be effectively incorporated into the Argument Scheme from Values, which can be aligned with the dimensions of the MFT to evaluate whether a particular action or decision aligns with or violates these morals. Within this scheme, a positive value bolsters the commitment to a goal, reflecting the virtues inherent in the moral foundations. For example, if a person values fairness (a positive value), it might support their commitment to a goal that promotes equality. Conversely, a negative value in the Argument Scheme from Values contradicts the commitment to a goal, and this can be linked to the transgression of the moral foundations. For example, if an individual perceives dishonesty (a negative value), it might undermine their commitment to a goal that necessitates trust.

Table 1 Examples of the MFT moral foundations from the analysed corpora

The Care/Harm foundation, emphasising virtues such as kindness, compassion and nurturance, reflects an individual’s instinctive care for others and the avoidance of harm. It specifically resembles a scheme of Argument from Consequences (Walton et al. 2008; Walton 1996), as the positive and negative consequences of an action appeal to our intuitive ethics of care and harm, respectively. Fairness/Cheating underscores the significance of justice and rights, highlighting an individual’s inherent sense of fairness and distaste for deceit. This foundation appears to align with the Argument from Fairness (Justice) (Hansen and Walton 2013; Walton and Macagno 2015; Walton and Hansen 2013; Walton 2012; Walton et al. 2008). The Loyalty/Betrayal foundation pertains to virtues such as patriotism and self-sacrifice for the group, which correlates with the schemes of the Argument from Commitment and Argument from Sacrifice (Walton et al. 2008). Authority/Subversion foundation, which values obedience and deference, emphasizes adherence to tradition and legitimate authority. Argument from Authority scheme (Walton et al. 2008), and more specifically, the concept of deontic authority (Koszowy and Walton 2019) can be used to justify or challenge hierarchical structures or traditions, which are integral elements of the Authority/Subversion moral foundation. Finally, Sanctity/Degradation foundation embodies the aspiration to protect what is deemed sacred, be it tangible entities, philosophical ideals, or institutional structures. A corresponding argument scheme is Argument from Sacred Scriptures, which is deeply rooted in preserving the sanctity of these revered texts and the moral guidance they offer (Walton and Sartor 2013).

By integrating MFT and Argumentation Theory, this paper aims to address the previously unexplored link between arguments and morality (van Eemeren and van Haaften 2023; Frank 2004) by providing a comprehensive approach to moral reasoning. This is achieved by detecting, analysing and evaluating moral foundations in arguments. Our fundamental challenge is to develop a reliable approach that combines quantitative and qualitative methods in order to enable us to identify, analyse and interpret users’ moral foundations as conveyed through diversified arguments sourced from socio-political corpora: (i) offline debates in the British Parliament during Thatcher’s era; the BBC4 ‘Moral Maze’ radio programmes; and the US Presidential TV debates in 2016; as well as (ii) online discussions on social media about COVID-19 in 2022; and Reddit reactions to the US 2016 Presidential TV debates.

The study seeks to answer the following question: How can we comprehensively analyse the moral foundations inherent in arguments using both quantitative and qualitative approaches? To this end, we developed Moral Arguments Analytics, MArgAn, an AI-based technology designed for comprehensive analysis of moral foundations in argumentation across various domains. More specifically, it allows for large-scale discourse analysis, i.e., for meaningful interpretations of vast amounts of information on how people use moral arguments in the selected corpus-analysed discussions. In this paper, we apply MArgAn in three case studies, wherein we formulate and test the following hypotheses: [H1] People predominantly rely on moral arguments, whether in online or offline discussions, rather than using arguments that do not involve moral considerations; [H2] The effectiveness of an argument in influencing individuals’ perspectives and actions varies depending on the moral foundations employed, with some foundations being more persuasive in certain contexts than others; [H3] There are significant disparities in the use and valence of moral foundations in arguments between offline and online dialogues.

The primary accomplishments of this study, characterised by significant contributions, can be succinctly summarised as follows: (i) We conducted an assessment of two lexicon-based approaches (MFD and eMFD) to detect moral foundations, utilising data from diversified discourses. The outcomes not only validate the theoretical soundness of MFD in identifying moral foundations but also shed light on its limitations. (ii) We introduced an AI-based argument technology explicitly tailored for the analysis of moral dispositions within argumentation. This pioneering effort marks the initial step in creating an AI component imbued with moral reasoning, capable of discerning moral considerations expressed through linguistic cues.

2 Related Works

The field of exploring morality through natural language processing (NLP) is receiving growing attention, particularly due to the substantial volume of text data generated on social media platforms. Firstly, these studies often limit their analysis to a basic differentiation between the presence or absence of moral foundations in the text, employing methods such as manual, automatic, or a combination thereof (e.g. Asprino et al. 2022; Atari et al. 2023; Efstathiadis et al. 2022; Garten et al. 2018, 2016; Hoover et al. 2019; Teernstra et al. 2016; Trager et al. 2022).

Secondly, while most of them enable binary classification of morals, only a Word Embedding-based Moral Foundation Assignment (WEMFA) assigns multiple moral foundations to a single utterance (González-Santos et al. 2023). Next, the majority of approaches concentrate on the analysis of morality in separate text messages (e.g. Fulgoni et al. 2016; Hoover et al. 2019; Johnson and Goldwasser 2018, 2019; Kaur and Sasahara 2016; Lin et al. 2020; Rezapour et al. 2019; Sylwester and Purver 2015; Volkova et al. 2017) rather than examining the moral load of dialogical arguments. However, the interactions between users, shaped by their moral beliefs, were already identified as connections linking tweets and their respective replies (either attack or support) in the Twitter conversations annotated by human experts (not automatically) as the MoralConvITA dataset (Stranisci et al. 2021).

Recently, some research used a few deep-learning techniques to extract expressed moral foundations from texts (Araque et al. 2020; González-Santos et al. 2023; Huang et al. 2022; Liscio et al. 2022; Pavan et al. 2020; Rezapour et al. 2019; Trager et al. 2022). While they capture contextual information or even cross-domain approach, only certain studies, (e.g., Pacheco et al. 2022; Roy et al. 2021), allow for the identification of the relational structure within moral actions. Roy et al. (2021) explored two established approaches for tackling relational learning challenges: Probabilistic Soft Logic (PSL) and Deep Relational Learning (DRaiL). They used these approaches to predict moral foundations, which are considered as frame predicates linked to positive and negative entity roles (e.g., an entity ensuring care, an entity causing harm). This demonstrated the efficacy of incorporating dependencies and contextual information. Pacheco et al. (2022) proposed a combination of DRaiL and deep neural networks and a statistical relational learning (SRL) to establish connections between fine-grained morals, stance and arguments at the user-level. To achieve that they distinguished two types of roles: an actor (“do-er”, i.e., the “entity doing good/bad”) responsible for actions/influence that might result in a certain outcome for the target (“do-ee”, i.e., the “entity benefiting/suffering”). This further allowed for discovering the correlations between the arguments made, their moral frames, stances, reasons, and the dependencies between them. While various studies in the field of exploring morality through NLP have made significant strides, there appears to be a notable gap in the examination of how moral dispositions are employed within the context of arguments. Despite advances in techniques such as deep learning and relational learning, limited attention has been given to the intricate interplay between moral reasoning and arguments at a fine-grained level. This suggests an opportunity for further investigation into the nuanced ways in which morality shapes and manifests within dialogical exchanges and arguments. In response to this research gap, we have developed analytics utilising moral foundation distribution (MFD) and enhanced moral foundation distribution (eMFD). Additionally, we have designed a visualisation interface that facilitates a foundation-driven approach to argument analysis. It includes modules for analysing moral discourse, interlocutor distribution, moral scores, and moral word occurrence. Our system integrates a crucial exploratory quantitative module to facilitate an in-depth explanatory qualitative analysis of moral discourse.

3 Data and Methodology

This section delineates the methodological pipelineFootnote 1 for analysing moral foundations in argumentation, encompassing stages from data collection (see Sect. 3.1) and annotation (Sect. 3.2) to the design of analytics (Sect. 3.3).

3.1 Data Collection

In this study, we utilise five corpora (available at http://corpora.aifdb.org/) annotated with supporting and attacking arguments according to Inference Anchoring Theory (see Sect. 3.2.1): US2016tv and US2016reddit, PolarIs1, Hansard, and Moral Maze. Collectively, these corpora encompass discussions spanning a wide spectrum of topics, from political elections and vaccine conspiracy theories to common moral issues. We have categorised the corpora based on their respective modes of dialogical argument exchange into: offline (face-to-face) and online discussions (see Table 2).

Table 2 Statistic summary Of Words, ADUs, and speakers in the corpora

3.1.1 Offline Dialogue

Face-to-face discussions, often characterised as offline dialogues, are a fundamental form of human communication. To integrate the dynamics of these offline dialogues into the analysis of moral arguments, we have chosen three specialised corpora that are enriched with argumentation annotations. Hansard corpus is derived from the official records of the British Parliamentary debates, i.e., Hansard.Footnote 2Hansard corpus, from the period of 1979–1990, was originally curated for ethos annotation and ethos mining (Duthie et al. 2016) and then annotated according to Inference Anchoring Theory (Budzynska and Reed 2011). Moral Maze corpora consist of transcripts of discussions on the Moral Maze Radio 4 programme where panellists explore various moral and ethical dilemmas (Budzynska et al. 2015). For the purpose of our study, we have selected five discussions that exemplify domain diversity and uphold data quality: British Empire, DDay, Morality of Money, Morality of Hypocrisy, and Welfare State. US2016tv corpora contain arguments exchanged during the TV debates between presidential candidates in the 2016 elections in the US (Visser et al. 2020). US2016tv consists of three collections: US2016R1tv sub-corpus contains a Republican primary TV debate, US2016D1tv—a Democratic primary TV debate, and US2016G1tv—covers a televised debate of the 2016 general election.

3.1.2 Online Dialogue

In the context of this study, online dialogue encompasses conversational interactions occurring among online users through their posts and replies. To facilitate our analysis, two corpora were meticulously curated, both originally extracted from the Reddit social platform. PolarIs1 corpus (Polarisation Issues subcorpus, available at https://newethos.org/resources/) comprises manually collected Reddit posts discussing conspiracy theories related to the COVID-19 vaccination. These posts are sourced from two threads within the “r/conspiracy" subreddit. US2016reddit corpus is a sub-corpus of US2016 (Visser et al. 2020) that contains real-time user reactions on Reddit in response to the TV debates between presidential candidates in the 2016 US elections (see above). It comprises three sub-corpora: US2016D1reddit, US2016G1reddit, and US2016R1reddit.

3.2 Data Annotation

These datasets were selected to utilise existing argument annotation which is particularly time consuming. The five corpora with diverse moral profiles have been previously annotated with pro- and con-arguments according to Inference Anchoring Theory (Sect. 3.2.1). Next, we applied moral foundation detection processes for subsequent analysis. We evaluated the suitability of our corpora for the moral foundation analysis within argumentation (Sect. 3.2.2).

3.2.1 Argument Annotation

Inference anchoring theory (IAT) provides a theoretical framework for argument annotation in dialogue (Budzynska and Reed 2011) and serves as a valuable tool for argumentation analysis and dialogical argument mining. IAT integrates elements from argumentation theory, speech act theory, and dialogue theory to comprehensively portray the interplay among diverse communication structures. More specifically, IAT anchors logical inference in dialogical structures through various illocutionary forces.

In adherence to the principles of IAT, the procedure for argument annotation unfolds as a structured sequence of five fundamental stages: (i) segmenting utterances into Argumentative Discourse Units (ADUs) based on their propositional contents and argumentative functions; (ii) reconstructing propositional contents of ADUs (locutions); (iii) establishing transitional connections between ADUs; (iv) linking ADUs and propositions via illocutionary connections; and (v) identifying argument relations, such as Inference (Support), Conflict (Attack) and Rephrase, between the reconstructed propositional contents.

In total, we have collected 8,816 IAT-annotated arguments (see Table 3). Our analysis of moral foundations in argumentation primarily focuses on two classical notions of arguments: Supports (pro-arguments) and Attacks (con-arguments). It’s interesting to note that supports are more prevalent than attacks across all the corpora we collected.

Table 3 Statistics on arguments

3.2.2 Moral Foundation Detection

To explore the presence of moral elements within dialogical arguments, we employed the Moral Foundations Dictionary (MFD, Graham et al. 2009). This dictionary, an enhanced adaptation of the Linguistic Inquiry and Word Count program (LIWC, Pennebaker and Francis 1999), provides information on the proportions of the five universal moral foundations. After the validation of the moral domains by Graham et al. (2011), two subsequent dictionaries were introduced to enhance the scope of moral dispositions: the Moral Foundations Dictionary 2 (MFD2, Frimmer 2019) that offers a broader range of moral sentiment expressions, and the Extended Moral Foundations Dictionary (eMFD, Hopp et al. 2021) that derives from large textual annotations developed by human coders.

In this study, we decided to utilise the Moral Foundation Dictionary, which offers a list of words associated with five primary moral foundation categories that are suitable for our research purposes. Regarding valence categorisation, we assumed that each moral category could manifest either a positive (+) or a negative (−) polarity (Pacheco et al. 2022): Care+/−; Fairness+/−; Loyalty+/−; Authority+/−; and Sanctity+/−, resulting in 10 categories of moral expressions in argumentation. Given the multi-label classification task, the performance of MFD was evaluated against a gold standard set by human annotations, derived from a sample of 500 ADUs. To achieve a balanced evaluation set, we sampled each moral category based on the frequency of the least common moral categories in each dataset. We then combined all the samples from different datasets to form the evaluation collection. The evaluation process involved calculating both the accuracy and F1 score for each category. These metrics were then used to compute the average accuracy and average macro-F1 score, providing a comprehensive assessment of MFD’s ability to predict moral judgements. The MFD demonstrated reliable performance, achieving an accuracy of 83.12% and a macro-F1 score of 0.629, outperforming eMFD method which reached accuracy of 80.42% and a macro-F1 score of 0.53.

Finally, we assessed the prevalence of moral foundations to ascertain the appropriateness of the data with respect to the research questions. The preliminary analysis results (see Table 4) reflect that our constructed corpora contain discernible moral foundations (30%). The richness of moral foundation in ADUs is crucial for our further analytical design.

Table 4 Preliminary analysis: general moral foundation distribution

3.3 Data Analysis

Utilising the annotations of moral foundations within argumentative discourses, we have developed the Moral Argument Analytics Interface (MArgAn).Footnote 3 The tool is built upon Argument Analytics (Lawrence et al. 2016, 2017) that computes properties of an argument graph and interprets them as properties of a debate. The MArgAn interface is designed to augment scalable data analysis by processing a variety of datasets, facilitating the creation of customised datasets, offering adaptive data visualisation through the use of a flexible colour scheme and a range of infographics. This supports both quantitative (Sect. 3.3.1) and qualitative analyses (Sect. 3.3.2), thereby enabling a thorough investigation of arguments annotated with moral foundations.

3.3.1 Quantitative Analysis

In order to achieve our analytical goals, we have meticulously designed and implemented the following key modules. Each of these modules is capable of generating either visualisation outputs or statistical tables. The Moral Foundation Distribution module allows us to analyse the prevalence of moral foundations within argumentative discourse. It supports both comparative analysis across various corpora and single corpus analysis within the specific corpus. It enables the examination of morals under various moral scales, such as the presence or absence of moral content or the polarity of moral foundations. The calculation of moral foundation distribution can be performed based on ADUs (ADU-based) or arguments (relation-based). Additionally, the system can dissect the distribution of moral foundations based on argument properties (i.e., supports or attacks, constructed by the same speakers or not) and ADU characteristics (i.e., supports or attacks, inputs or outputs of arguments).

Interlocutors Distribution is a speaker-oriented (entity-based) analytic tool designed to analyse how various speakers employ moral valence strategies across different moral foundations. We categorise these strategies into four types: non-specific moral foundation (non), exclusively negative moral foundation (only −), exclusively positive moral foundation (only +), and a combination of both (mixed). To interpret this data, the system can generate a heatmap that provides an overview of the distribution across all five moral foundations, or bar charts for a detailed analysis within a single moral foundation. Interlocutors Moral Scores is an entity-based module that calculates and presents the average moral scores of speakers in a particular corpus. It breaks down each speaker’s moral score into five moral foundations, corresponding to the percentage of ADUs associated with each of the five moral foundations. The Word Cloud module visualises the occurrence of moral foundation words within a corpus. It facilitates a deeper textual analysis by presenting the ADUs (ADU-based) or arguments (relation-based) containing these moral foundation words.

3.3.2 Qualitative Analysis

MArgAn allows for a comprehensive interpretation of the outcomes obtained from quantitative analysis, thereby augmenting our capacity to comprehend the intrinsic data more profoundly. For the execution of qualitative comparative corpora analysis, an assortment of analytical modules can be utilised in various combinations. In Study 1, we aggregated all argument properties, including Support or Attack, and arguments constructed by the same speaker or different speakers. These were then presented on a Moral Scale, delineating between Moral and No-moral categories. Moreover, we examined Same and Different Speakers, focusing on Supporting versus Attacking arguments, categorised as Moral versus No-moral. Additionally, we explored the distinctions between offline and online datasets. The input to the Inference of Conflict Node in an argument graph is a text segment, or multiple text segments, which we refer to as Argumentative Discourse Units (ADUs). In the context of inference, these are traditionally known as premises or reasons, while in conflict scenarios, they are referred to as attackers. The output from the Inference of Conflict Node in an argument graph is a single ADU. This output is traditionally termed a conclusion in inference, and in conflict, it is the text that is under attack. The outcomes were quantified and represented as percentages across all the datasets.

In Study 2, we conducted an analysis of the properties of ADUs across all ten moral foundations. These foundations are divided into two categories: five are positively oriented and five are negatively oriented. Our analysis took into account whether the ADUs were supportive or attacking in nature, and also considered both the Input and Output arguments. The results of our analysis were quantified and are presented as percentages for clarity and ease of understanding. Finally, in Study 3, we undertook a comprehensive consolidation of all properties associated with ADUs, which included elements of Support or Attack, as well as Input and Output. This was done while differentiating between datasets derived from offline and online sources. The collated properties were then visualised on a Moral Scale, which made a clear distinction between categories of Moral and No-moral. Our subsequent analysis involved a detailed examination of all ADU properties, which were categorised into ten moral foundations. In addition, we delved into the positive and negative valences present within the dataset. The results of our investigation were quantified and are presented as percentages for a clear and concise understanding.

4 Morals in Arguments

In study 1, we tested the hypothesis [H1] suggesting that people would predominantly rely on moral arguments, whether in online or offline discussions, rather than using no-moral arguments. The primary goal is to examine the prevalence of moral arguments in contrast to no-moral arguments across all datasets and within the realms of both online and offline discussions. This objective aligns with [H1] and seeks to empirically analyse the extent to which moral arguments are favoured over no-moral arguments in different settings.

Table 5 presents a statistical summary. Moral and no-moral arguments are equally distributed. In discussions, whether held offline or online, participants consistently incorporate moral foundations into their arguments, irrespective of the specific topic under discussion, but they do not overwhelmingly dominate or overshadow the discussions.

Table 5 A Comparative Distribution of Moral and No-Moral Arguments

Complex Nature of Argumentation. The dominance of employing moral arguments is clear across a range of subjects, such as discussions regarding vaccinations on Reddit (56% of arguments refer to moral foundations), the US presidential elections (55%), and the British parliamentary debate (52%) (see Fig. 1).

Fig. 1
figure 1

A comparative analysis of moral and no-moral arguments across datasets

Surprisingly, 54% of arguments in the BBC’s Moral Maze radio program, as well as 60% of the 2016 US Election discussion on Reddit, which ran parallel to the televised presidential candidate debates, were primarily centred on no-moral arguments.

The chi-square test was conducted to examine the association between moral foundation presence and different datasets. The analysis revealed a significant association (\(\chi ^2 = 148.15\), \(df = 4\), \(p < 0.001\)), suggesting that moral foundation presence varies significantly across different datasets. The extremely small p value of \(5.076\times 10^{-31}\) further confirms the statistical significance of this association. Evidently, argumentation is a complex and context-sensitive process. The balance between moral and no-moral arguments highlights the adaptability and flexibility of arguers in selecting the most effective strategies based on the context and goals of the discussion.

Fig. 2
figure 2

A comparative analysis of (same and different) speaker-supported arguments based on moral and no-moral approach across datasets

Nature of Disputes. The same pattern becomes evident also when illustrating arguments that support or challenge the main claims (see Fig. 2) or attack them (Fig. 3).

Fig. 3
figure 3

A comparative analysis of (same and different) speaker-attacked arguments based on moral and no-moral approach across datasets

In the examination of argumentative attacks, a disparity in the distribution of moral and non-moral content is evident. Specifically, the “No Morals Baseline”, which represents arguments without explicit moral content, is 4.9% higher than the “Morals Baseline” (52.45% vs. 47.55%). This suggests a more prevalent occurrence of non-moral arguments in these cases. Although this difference is noticeable, it is not statistically significant as indicated by the t-test (\(t(df) = 0.59\), \(p = 0.571\)). This finding suggests that while there is a noticeable difference in the distribution, it may not be statistically meaningful. It could be inferred that the selection of argument type in challenging claims could be shaped by the nature of the disputes, the diversity of discussions across various contexts, and the degree of moral content integration in these debates.

Offline vs. Online: A Tale of Two Realities. Upon comparing the offline and online datasets, a notable divergence becomes evident in the utilisation of moral and no-moral arguments within offline and online contexts. In offline discussions, when supporters are bolstering their claims, there is a more pronounced emphasis on moral orientation (see Fig. 4). Notably, over 52% of all arguments in datasets such as OFF Hansard, OFF Moral Maze, and OFF US 2016 TV Debate refer to moral foundations. Conversely, in online discussions, the majority of supporting arguments, accounting for more than 53%, are founded on no-moral considerations.

Fig. 4
figure 4

A comparative analysis of (same and different) speaker-supported arguments based on moral and no-moral approach in offline and online discussions

To further investigate this, a Chi-square test was conducted for speaker-supported arguments. This test analysed the association between argument types (Morals/No morals) and dialogue mode (Offline/Online). The resulting Chi-square statistic was \(\chi ^2 = 23.33\), with degrees of freedom \(df = 1\) and a p value \(p < 0.001\). This result indicates a highly significant association (\(p = 1.365 \times 10^{-6}\)) between the presence of a moral foundation in arguments and the dialogue mode. This suggests that the mode of dialogue (offline or online) might significantly influence the presence of moral foundations in support arguments. Offline discussions tend to emphasise moral foundations when supporting claims, which might indicate a stronger reliance on ethical considerations in these contexts.

Fig. 5
figure 5

A comparative analysis of (same and different) speaker-attacked arguments based on moral and no-moral approach in offline and online discussions

The situation undergoes a surprising transformation when arguers engage in the act of challenging or disputing the initial argument or claim (Fig. 5). In offline discussions, we note a striking shift, with over 52% of them being grounded in no-moral arguments. Conversely, in online discussions, a remarkable balance is observed, with roughly 50% of arguments drawing upon moral considerations, and nearly an equal 50% relying on no-moral ones. Offline discussions might involve arguments challenging the moral foundations of a claim, leading to the prevalence of no-moral arguments. In contrast, online discussions show a more balanced use of moral and no-moral arguments when disputes occur, suggesting that the online environment may accommodate a wider range of argumentative strategies. This result led us to further investigate the relationship. A Chi-square test was conducted to assess the association between argument types (Morals/No morals) and dialogue mode (Offline/Online). The Chi-square statistic yielded a p value of 0.224 (\(\chi ^2 = 1.48\), \(df = 1\), \(p > 0.05\)). Since the p value is greater than the chosen significance level of 0.05, we fail to reject the null hypothesis. This suggests that there is no significant association between the argument types and the dialogue mode for speaker-attacked arguments.

Context Matters. The influence of context is apparent, highlighting the significance of context-specific considerations when analysing the deployment of both moral and no-moral arguments. It implies that the discourse environment and the specific discourse roles (supporting or challenging) significantly influence the prevalence of moral foundations in arguments.

5 Types of Moral Foundations in Moral Arguments

In Study 2, we examined the hypothesis [H2] that the effectiveness of an argument in swaying individuals’ perspectives and actions is contingent upon the specific moral foundations employed, with these foundations varying in their persuasive power depending on the context. Our aim was to empirically probe the role of a spectrum of moral foundations in the formulation of arguments across diverse topics. This objective dovetails with [H2], and is designed to delve into the influence of different moral foundations on the construction of arguments, and how these foundations shape individuals’ viewpoints within various dialogic contexts.

Moral Foundations in Supporting Arguments. Arguments are rooted in particular moral foundations, with the specific foundation varying based on the context. When analysing input supporting arguments that reinforces or agrees with the initial argument or claim, it becomes evident that offline political discussions, i.e., Hansard and the 2016 US TV Debate, more frequently incorporate foundations like Loyalty+, Authority+, and Care+ (see Fig. 6). In the first instance, these arguments predominantly referenced Authority+ (36.24%), Care+ (31.88%), and Loyalty+ (29.69%), summing up to a substantial 97.81% of all arguments presented during the debate.

Fig. 6
figure 6

A comparative analysis of input speaker-supported arguments based on moral foundations across datasets

In the second instance, the predominant focus of the same arguments, i.e. Loyalty+ (39.81%), Authority+ (26.68%), and Care+ (17.24%), account for a combined total of 83.73% of all arguments. When examining the same cases while focusing on output supportive arguments, we notice that significant differences are not readily apparent (Fig. 7). In the OFF Hansard dataset, the ‘Authority+’ score decreased to 33.17%, the Care+ to 31.73%, and the Loyalty+ to 27.4%, accounting for a slight decrease from 97.81% to 92.3% of all arguments presented during the debates. In the OFF US2016 TV Debate dataset, the Loyalty+ score rose to 43.03% and the Authority+ score increased to 32.76%. However, the Care+ score decreased to 15.16%. Consequently, the combined total of all arguments increased from 83.73% to 90.95%. In the case of the OFF Hansard, there is a notable stability in moral foundations as evidenced by the transition from input to output arguments. However, in the OFF US2016 TV Debate, it is evident that the input arguments impacted the output ones.

Fig. 7
figure 7

A comparative analysis of output speaker-supported arguments based on moral foundations across datasets

For less political discussions like COVID vaccination in the ON PolarIs1 dataset, we noticed a predominant focus on just two foundations. These are Care+ (accounting for 38.34% of the arguments) and Sanctity− (accounting for 26.58%) (see Fig. 6, Table 6). Together, these two foundations account for a combined total of 64.92% of the input arguments made by supporters. A high Care+ score aligns with the idea that vaccination is an act of social responsibility to prevent the spread of the virus, reduce strain on healthcare systems, and protect vulnerable populations. The Sanctity−, i.e., degradation, foundation comes into play when discussing issues related to vaccine hesitancy, misinformation, or vaccine-related ethical concerns. Some supporters argue that protecting public health and adhering to scientific guidelines are more important than certain traditional or purity-based beliefs. The output of these supportive arguments resulted in a modest increase in the usage of Care+ to 40.92% and a slight decrease in Sanctity− to 23.92%, with almost no impact on the combined total, which remained at the same level of 64.84% (see Fig. 7). This suggests a nuanced shift in the moral underpinnings of discussions when moving from input to output arguments in these contexts.

Table 6 Qualitative analysis of representative moral foundations in ON PolarIs1

Moral Foundations in Attacking Arguments. The similar patterns are observed in the case of attacking arguments as in supporting arguments (see Figs. 8, 9).

Fig. 8
figure 8

A comparative analysis of input speaker-attacked arguments based on moral foundations across datasets

However, when comparing moral appeals used in constructing supportive (see Fig. 6) and attacking arguments (see Fig. 8), interesting observations emerge. Supporting arguments tend to display elevated moral foundation scores compared to attacking arguments. A striking example can be seen in the OFF US2016 TV Debate. The moral foundation of Loyalty+ is represented at a rate of 39.82% in supporting arguments, compared to a lower rate of 29.82% in attacking arguments. Conversely, the moral foundation of Authority+ is less prevalent in supporting arguments, appearing at a rate of 26.68%, while it is significantly more common in attacking arguments, with a representation rate of 52.63%. This data suggests a distinct difference in the moral foundations appealed to in supporting versus attacking arguments within this context. Moreover, quite high values of Loyalty+ observed in all discussions, as shown in Table 7, suggest that it plays a significant role in supporting arguments across various contexts. This observation is further emphasised by the elevated scores of Authority+ like in the OFF US2016 TV Debate and OFF Hansard, although it does not hold true for OFF Moral Maze and ON Polaris1, where Care+ and Santity− are more often referred to. Furthermore, attacking arguments consistently exhibit higher scores of Care+ compared to supportive arguments, in discussions like OFF Moral Maze, ON US2016 Reddit, and ON PolarIs1. However, this pattern doesn’t hold in political datasets (see Table 8) such as OFF Hansard (British Parliamentary Debates) and OFF US2016 TV (Presidential Election Debates).

Fig. 9
figure 9

A comparative analysis of output speaker-attacked arguments based on moral foundations across datasets

Table 7 Loyalty+ scores in supporting arguments and attacking arguments across datasets
Table 8 Care+ scores in attacking arguments and supporting arguments across datasets

6 Moral Arguments in Types of Discourse

In study 3, we hypothesise [H3] that there are significant disparities in the use and valence of moral foundations in arguments between offline and online datasets. Through our exploration of how individuals use moral arguments to express their perspectives and influence others in various communication contexts, we aim to provide insights into the differences in argumentative strategies and perspectives across diverse communication contexts.

Differing Emotional Tone in Moral Arguments. The data suggests a notable disparity in the valence of moral arguments between offline and online discussions (see Fig. 10). The Chi-square test, performed to investigate the relationship between moral valence (positive/negative) and dialogue mode (online/offline), indicated a significant association (\(\chi ^2 = 206.16\), \(df = 1\), \(p < 0.001\)). The exact p value obtained was \(9.472 \times 10^{-47}\). This is not due to random chance, but rather a significant effect or relationship is present. In offline discussions, approximately 26.1% of moral arguments have a negative valence, while this figure significantly rises to 49.28% in online discussions.

Fig. 10
figure 10

A comparative analysis of the valence in moral foundations in offline and online datasets

Online discussions exhibit a higher negative valence, indicating a positivity score of +11.59% compared to the baseline of 37.69%. Conversely, moral arguments with a positive valence are more prevalent in offline discussions, comprising around 85.95% of such arguments, compared to 67.03% in online discussions. Offline discussions exhibit a more positive valence (+9.46%) in moral arguments, surpassing the baseline of 76.49%. These findings indicate a clear contrast in the emotional tone of moral arguments between these two settings.

Contrasting Dominance of Moral Foundations. Offline discussions are dominated by a few foundations, while online discussions exhibit a more even distribution of many foundations. The Chi-square test was conducted to assess the relationship between moral foundation distribution and dialogue mode. The analysis yielded a chi-square statistic of 921.08 with 9 degrees of freedom (\(\chi ^2 = 921.08\), \(df = 9\), \(p < 0.001\)). The p value associated with the test was extremely small (\(p=1.77 \times 10^{-192}\)), indicating a highly significant association between the moral foundation distribution and dialogue mode. This divergence becomes particularly pronounced in the way different moral foundations are emphasised in arguments (see Fig. 11). Loyalty+ (37.1%), Authority+ (34.75%), Care+ (21.4%)

Fig. 11
figure 11

A comparative analysis of moral foundations in offline and online datasets

play a significant role in shaping arguments in offline discussions, constituting a dominant total of 93.25%. In contrast, the distribution of moral foundations in online discussions takes the following form: Care+ (31.91%), Care− (18.57%), Fairness+ (5.61%), Fairness− (3.69%), Loyalty+ (20.35%), Loyalty− (0), Authority+ (12.91%), Authority− (1.73%), Sanctity+ (9.07%), Sanctity− (28.55%), with the top three foundations accounting for a cumulative total of 80.81%. Online discussions encompass a more diverse array of moral foundations, distributed more evenly, signifying a greater plurality of viewpoints and argumentation strategies in the realm of online communication (clearly neglect to reference the betrayal foundation).

Diversified Distribution of Moral Foundations between Interlocutors. Data indicate differences in the distribution of moral foundations, both in their positive and negative forms, between online and offline interlocutors (see Fig. 12). In online discussions, interlocutors emphasise Care+ (10.3%) and Care− (7.58%) that together make up 17.88% of the conversation related to moral foundation of care. In offline context, Care+ (21.07%) and Care− (6.78%) collectively contribute to 27.85% of Care foundation.

Fig. 12
figure 12

A comparative analysis of moral foundation distribution between online and offline interlocutors

This demonstrates that offline interlocutors place a more substantial emphasis on care-related morals compared to their online counterparts. In online discussions, participants bring up Fairness+ (3.94%) and Fairness− (2.26%), which combined represent 6.2% of the discussions related to the fairness principle. In contrast, in offline contexts, Fairness+ (12.59%) and Fairness− (4.36%) together constitute 16.95% of the fairness-related foundation. It appears that the representation of moral foundations associated with fairness is significantly more prevalent in offline interactions. In online discussions, interlocutors mention Loyalty+ (10.19%) and Loyalty− (0.06%), which together account for 10.25% of the total Loyalty foundation. On the other hand, in offline contexts, Loyalty+ (35.35%) and Loyalty− (0.73%) together make up 36.08% of the loyalty-related foundation. Offline interlocutors consistently exhibit a stronger emphasis on loyalty-related moral foundation compared to their online counterparts. Online interlocutors have a combined total of Authority+ (9.66%) and Authority− (1.27%), accounting to 10.93% of the total authority. In offline context, Authority+ (38.74%) and Authority− (1.21%) add up to 39.95% for authority foundation. Offline interlocutors strongly emphasise authority-related moral foundation, while online interlocutors assign them less importance. Online interlocutors exhibit a combined total of Sanctity− (14.47%) and Sanctity+ (3.99%), which equals 18.46% of the total sanctity. In offline context, Sanctity− (2.66%) and Sanctity+ (8.47%) combine to 11.13% in total for sanctity foundation. It’s worth noting that online interlocutors place a higher emphasis on only sanctity-related moral foundation compared to their offline counterparts. These findings indicate variations in the distribution of moral foundations between online and offline interlocutors, with Care, Loyalty, and Authority foundations being more prominent in offline than online contexts (Fig. 13).

Fig. 13
figure 13

A comparative analysis of moral foundation scores between online and offline datasets

7 Discussion

The study’s findings contradict hypothesis [H1], which suggested a preference for moral arguments over no-moral ones in both online and offline discussions. Context significantly shapes the prevalence of moral foundations in arguments, with specific roles within a discourse and the type of discourse environment influencing this. Recognising context’s impact is essential for a comprehensive understanding of argument dynamics. While moral appeals are common in discussions, the choice of argument type is context-dependent. In offline discussions, over 50% of arguments rely on moral foundations when reinforcing claims, while online discussions lean towards no-moral considerations for support. During challenging discussions, offline shifts to no-moral arguments, while online maintains a balanced use of both, highlighting adaptability in the online environment. These findings reveal the dynamic role of moral and no-moral arguments across different communication contexts, emphasising the importance of context-related variables in analysing argumentation strategies.

The data corroborates Hypothesis [H2], demonstrating that the potency of an argument can indeed fluctuate based on the specific moral foundations it invokes, with certain foundations proving to be more persuasive in particular contexts than others. It confirms previous research that the persuasive power of a moral appeal can be highly influential and more effective (e.g., Clifford and Jerit 2013; Leidner et al. 2018; Luttrell et al. 2019; Täuber and van Zomeren 2013). Political arguments, particularly those reframed to appeal to the moral values of those holding opposing political positions, are typically more effective, as (Feinberg and Willer 2015) suggests. This might explain why, in political discussions, such as the Hansard (British Parliamentary Debates) and the US2016 TV Debate (Presidential Election Debates), Loyalty+, Authority+, and Care+ foundations dominate in supportive arguments. In the 2016 US TV Debate, these foundations accounted for a substantial 97.81% of all arguments. Such principles exhibit remarkable stability in output supportive arguments in these contexts. Conversely, in less political discussions, such as COVID vaccination in the ON PolarIs1 dataset, Care+ and Sanctity- foundations take precedence in input arguments. The shift from input to output arguments in this context reveals nuanced changes in moral underpinnings (e.g. Voelkel and Feinberg 2018; Feinberg and Willer 2015). Similar patterns emerge in attacking arguments across contexts. Notably, supporting arguments consistently emphasise Loyalty+ more than attacking arguments in various cases, highlighting its prominent role in support. Attacking arguments consistently place greater emphasis on Care+ than supporting arguments, except in political datasets. It’s important to note that Loyalty can have varying interpretations among liberal and conservative discussants due to differences in their moral perspectives (Graham et al. 2009, 2011, 2012; Haidt 2012; Rai and Fiske 2011). Attacking arguments consistently place greater emphasis on Care. Care and Fairness are grouped together as the ‘individualising foundations,’ which emphasise the rights and welfare of individuals within social groups. On the other hand, Authority, Loyalty, and Sanctity form the ‘binding foundations,’ which are concerned with strengthening group cohesion and unity. Liberals tend to show a stronger preference for the ‘individualising foundations,’ while conservatives are more likely to endorse the ‘binding foundations’ (Graham et al. 2009; Haidt and Graham 2007; Hatemi et al. 2019; McAdams et al. 2008; Stewart and M.Morris 2021).Therefore, individualising foundations appear to be more prevalent in attacking arguments, while binding foundations are more common in supportive arguments, except for those engaged in political debates.

The findings of Study 3 support [H3], illustrating variations in the distribution of moral foundations and differences in emotional tone between offline and online argumentation. Due to variations in online behaviour (Short et al. 2015; Suler 2005) and the significance of emotions in the social dissemination of moral concepts (Brady et al. 2017), these differences manifest in the realm of moral discourse. Notably, the dominance of morals varies between the two contexts. Offline discussions primarily revolve around Loyalty+, Authority+, and Care+ foundations, while online discussions display a more diverse distribution, with Care+, Care−, and Fairness+ as prominent foundations. Firstly, it suggests that conservative principles predominantly dominated offline discussions, whereas liberal ones were more prominent in online discussions. Secondly, online discourse indicates more varied argumentation strategies. Moral dispositions more closely linked to arguments centred around Care, and Fairness exhibit greater popularity among liberals (e.g., Strimling et al. 2019) who tend to be more engaged in online interactions. Furthermore, online discussions feature a higher negative valence, while offline discussions exhibit a more positive valence in moral arguments. That might be explained with negative online behaviour that is more socially acceptable in virtual space (e.g., Short et al. 2015; Van Bavel et al. 2023). Additionally, we observe differences in the distribution of moral foundations between online and offline arguers. Offline interlocutors emphasise Care+, Fairness+, Loyalty+, and Authority+ foundations, whereas online interlocutors prioritise Sanctity+ and Sanctity−. The Care addresses the alleviation of others’ suffering, while the Fairness pertains to justice and equality; Loyalty emphasises in-group importance, and Authority deals with respect for higher ranks and tradition. Sanctity is concerned with sacredness and purity, guiding individuals to avoid behaviours that evoke disgust (Haidt 2007, 2012; Haidt and Joseph 2004). Conservatives tend to prioritise principles such as Loyalty, Authority, and Sanctity to varying degrees, while liberals generally endorse Care and Fairness (Haidt 2007, 2012; Haidt and Joseph 2004; Graham et al. 2009; Haidt and Graham 2007). In face-to-face dialogical interactions, moral foundations with positive valence are often emphasised. Online arguers, on the other hand, tend to protect their viewpoints by appealing to the principle of Sanctity across different contexts.

8 Implications

The introduction of MArgAn, an AI-based argument technology designed for the universal analysis of moral foundations in arguments, carries several significant implications.

Quantitative and Qualitative Synergy. By demonstrating how quantitative analysis can complement qualitative analysis of arguments, MArgAn bridges the gap between these two approaches. This synergy offers a more comprehensive perspective on argumentation, e.g., enabling researchers to explore both the prevalence and emotional tone of moral arguments in discussions.

Efficiency and Scalability. MArgAn’s AI-based technology streamlines the process of identifying and analysing moral arguments within large datasets. This increased efficiency is particularly valuable when working with extensive collections of arguments, as it saves time and resources.

Enhanced Analytical Capabilities. MArgAn’s capabilities foster access to deeper insights into how moral arguments influence and shape human communication. It allows for exploring the prevalence of moral arguments, their valence, and their distribution across different communication contexts, shedding light on the intricacies of argumentation in various domains.

Generalisability. The successful testing of MArgAn on a diverse dataset covering various socio-political issues showcases its generalisability across different domains. This suggests that the framework can be applied to a wide range of topics and discussions, making it a versatile tool.

Advancing Moral Awareness in AI Systems. This advancement forms the basis for the integration of moral awareness into AI systems, bolstering their ability to facilitate moral reasoning. Moreover, the analysis results provide a user-friendly synopsis of the prevalence of moral principles in argumentation, serving as a valuable resource to guide and strengthen the incorporation of moral arguments in AI systems reliant on argumentation.

9 Limitations and Future Study

The limitations of this study underscore potential challenges, particularly those related to the misclassification of the moral foundations’ detection algorithm, the absence of a solid grounding in the argument scheme, limited resources for argumentation annotation, and the implications these issues have for future research.

Algorithms Constraints on MFD/eMFD. The use of lexicon-based detection methods, such as MFD, is common in morality research due to their simplicity. The results confirm the theoretical robustness of MFD in uncovering moral principles. On the other hand, our error analysis with human annotations further underscores some limitations. MFD struggles with contextual nuances like negation, implications, and interactions between multiple moral foundations, leading to overgeneralised findings and a high rate of false positives. This limitation affects the accuracy of our analytical results. The comprehension of these limitations is of substantial value for enhancing interpretable methods for detecting moral principles. This includes potential refinements to the MFD/eMFD dictionary in future research, the necessity of developing reliable guidelines for moral foundation annotation, and exploring machine-learning algorithms capable of capturing more nuanced contextual information.

Advancements in MArgAn. The current results allow us for the continuous enhancement of MArgAn. This enhancement should include an exploration of the relationship between moral foundations and argument schemes, a topic that has not yet been explored. For instance, our analysis reveals a distinct difference in the moral arguments used for support and attack. Moreover, it demonstrates that online discussions display a broader range of moral foundations. These are linked with an increased occurrence of negative moral arguments. Surprisingly, moral arguments with a negative load appear to hold more weight than initially presumed, especially when compared to offline discourse. Moreover, it appears to encompass a broader understanding than what was previously assumed by the MFT. The integration of MFT and argument schemes could provide a robust framework for crafting effective moral arguments. This could significantly contribute to the refinement and expansion of argument schemes, enhancing their applicability and effectiveness in moral argumentation. It will enable us to understand moral reasoning better and to construct arguments that are more likely to resonate with our audience.

Data Availability. The argumentation annotation in this research primarily relies on IAT annotations. However, the data collection and existing IAT annotations are predominantly focused on the social-political domain, with limited coverage in more diverse areas. This limitation in scope could potentially affect the generalisability of our research findings and introduce biases in the analysis results.

10 Conclusion

In this paper, we introduced an AI-based argument technology, MArgAn, designed for the universal analysis of moral foundations in arguments, regardless of the topic. We identified moral arguments and showcased how quantitative analysis can augment the qualitative analysis of arguments. We tested it on a large, manually annotated, carefully curated dataset of arguments covering diversified socio-political issues. Through a detailed data analysis, we demonstrated the robustness of the framework and highlight the generalisability of our findings across various domains.

To conclude, our research reveals the intricate, context-dependent, and dynamic nature of both moral and no-moral arguments across diverse communication contexts. It challenges the initial assumptions of MFT and provides valuable insights into the moral reasoning via argumentation. These findings carry implications for understanding how individuals construct moral arguments and express their perspectives in varied environments, especially within today’s diverse and evolving communication landscape. Furthermore, the MArgAn framework that we developed offers new opportunities for further advancing moral argument analytics.