Keywords

1 Introduction

Argumentation is central for human communication and interaction, hence there are various strategies of implementing this concept in approaches to artificial intelligence. In the field of abstract argumentation [8], the underlying idea is to focus on the interplay between arguments and counterarguments rather than on the content of the arguments themselves.

The core formalism in this field is the abstract argumentation framework, which can be understood as a directed graph in which the nodes represent the given arguments, and the edges represent an attack relation between them.

Figure 1 shows an example of such a framework. Semantics are commonly expressed through so-called extensions, which are sets of arguments that jointly fulfill certain conditions. A fundamental semantics in the field of abstract argumentation is the concept of admissibility. In order to be an admissible extension, arguments in the set must not attack each other (i. e., the set must be conflict-free) and they have to defend each other from all outside attacks.

Typical problems in abstract argumentation include the problem of deciding whether an argument is included in at least one extension (or all extensions) wrt. a specific semantics, or the problem of determining an extension or enumerating all of them wrt. a specific semantics.

The literature already provides different families of (exact) approaches to solve the above-mentioned reasoning problems in abstract argumentation. One such family consists of reduction-based approaches—see, e. g., [1, 9, 17, 21, 24]—which encode a given problem in a different formalism—e. g., as a Boolean satisfiability problem—and then use an existing solver for that formalism. Another family of approaches consists of backtracking-based methods that make use of heuristics to guide the search procedure—see, e. g., [12, 22, 23].

Since most of the reasoning problems in abstract argumentation are computationally hard [10], this can result in exceedingly long runtimes when using an exact algorithm. To counteract this issue, machine learning-based approaches have been proposed in the literature [6, 7, 15, 19]. However, although these approaches proved to be significantly faster than their exact counterparts, they are not guaranteed to yield correct results (for a deeper analysis, see also [16]). Thus, the main advantage of an exact method (such as a reduction- or backtracking-based approach) is that it always provides correct results, while the main advantage of a (purely heuristic) machine learning-based approach is its runtime performance. An approach for combining these advantages is the use of machine learning techniques to predict the “best” exact solver from a portfolio [14, 25]. In the work at hand, we aim to harness the advantages of machine learning methods in a different manner. More precisely, we use predictions made by a machine learning model in order to inform a heuristic that guides a backtracking-based approach which ultimately yields a correct result. As an example for the overall approach we consider the task of deciding whether a given argument is accepted under admissibility [8], which is a core aspect in many reasoning problems.

In an experimental evaluation we compare the use of our machine learning-based heuristic (using a random forest) to the standard heuristic of the backtracking-based solver Heureka [12], and we demonstrate that both the number of backtracking steps as well as the overall runtime can be reduced when our newly proposed heuristic is applied.

To summarize, our contributions are as follows:

  • We present an approach that exploits the strengths of both machine learning and reasoning techniques by using machine learning-based predictions to create a heuristic which can accelerate an exact, backtracking-based solver.

  • Our approach offers a flexible solution, as both the machine learning component and the backtracking-based solver can be specified as desired.

  • In an experimental analysis, we show that our approach leads to a significant decrease in both the number of backtracking steps and overall runtime, when compared to a standard heuristic.

The remainder of the paper is organized as follows. In Sect. 2 we provide some preliminaries on the topic of abstract argumentation. After giving an overview on current solution approaches for problems in abstract argumentation in Sect. 3, we propose a machine learning-guided heuristic in Sect. 4. An extensive experimental analysis is presented in Sect. 5, Sect. 6 details the limitations of our research and finally we conclude in Sect. 7.

Fig. 1.
figure 1

An abstract argumentation framework.

2 Preliminaries

An abstract argumentation framework (AF) [8] is a tuple \(F= (\textsf{Args}, R)\), with \(\textsf{Args}\) being a set of arguments and \(R\subseteq \textsf{Args}\times \textsf{Args}\) defining an attack relation. An argument \(a\in \textsf{Args}\) attacks another argument \(b\in \textsf{Args}\) if \((a,b) \in R\). On the other hand, an argument \(a\in \textsf{Args}\) is defended by a set of arguments \(E \subseteq \textsf{Args}\) if for all \(b\in \textsf{Args}\) with \((b,a)\in R\), there exists a \(c \in E\) with \((c,b)\in R\).

An extension is a set of arguments that are jointly acceptable, given a set of conditions. Exactly which conditions need to be satisfied is determined by a semantics. There exists a multitude of different semantics in the literature, however, we focus on the preferred semantics introduced in the seminal paper by Dung [8].

Definition 1

Let \(F= (\textsf{Args}, R)\) be an argumentation framework. A set \(E\subseteq \textsf{Args}\) is

  • conflict-free if there are no \(a,b\in E\) such that \((a,b)\in R\),

  • admissible if E is conflict-free and each \(a\in E\) is defended by E within \(F\),

  • complete if every argument \(a \in \textsf{Args}\) defended by E is also included in E,

  • preferred if E is a \(\subseteq \)-maximal complete extension, and

  • grounded if E is a \(\subseteq \)-minimal complete extension.

Note that the grounded extension is uniquely determined [8].

Typical decision problems in the area of abstract argumentation include the problem of deciding whether a given argument is included in at least one extension (credulous acceptability) or all extensions (skeptical acceptability) wrt. a given semantics. In the following, we denote the problem of deciding credulous acceptability wrt. preferred semantics as \(\textsf{DC}\). Note that this problem is equivalent to the problems of deciding credulous acceptability under admissible, and under complete semantics.

Example 1

Consider the AF illustrated in Fig. 1. The conflict-free sets of this AF are

$$\begin{aligned} & \{ \emptyset , \{a_0\}, \{a_1\}, \{a_2\}, \{a_3\}, \{a_4\}, \\ & \{a_0, a_2\}, \{a_0, a_4\}, \{a_2, a_4\}, \{a_3, a_4\}, \\ & \{a_0, a_2, a_4\} \}. \end{aligned}$$

Out of these sets, only \(\emptyset \), \(\{a_4\}\), \(\{a_0,a_4\}\), and \(\{a_0, a_2, a_4\}\) defend themselves, and are thus admissible. Further, the only complete sets are \(\emptyset \) and \(\{a_0, a_2, a_4\}\), which makes the grounded extension (i.e., the \(\subseteq \)-minimal complete extension) \(\emptyset \), and the set of preferred extensions (i.e., the \(\subseteq \)-maximal complete extensions) consists only of \(\{a_0, a_2, a_4\}\). We can also see that the set of arguments contained in at least one admissible set (i.e., the set of credulously acceptable arguments wrt. admissibility) is \(\{a_0, a_2, a_4\}\), which is equal to the set of credulously accepted arguments under complete or preferred semantics.

3 Solution Approaches in Abstract Argumentation

The methods employed to address decision problems in abstract argumentation can be broadly categorized into reduction-based or direct approaches [5]. Reduction-based solvers operate by translating the reasoning problem into other formalisms, such as answer-set programming [9, 11], constraint-satisfaction problems [2, 4] or Boolean satisfiability [21, 26], leveraging existing solvers in those domains. The advantage of the reduction-based approach lies in the high efficiency of these existing solvers. On the other hand, direct approaches involve the implementation of a dedicated algorithm tailored to the structure of AFs, often utilizing backtracking. Direct solvers retain the structural information of the AF, allowing them to exploit specific shortcuts relevant to certain semantics [5].

Algorithm 1:
figure a

Backtracking-based algorithm \(\textsf {SEARCH}\) for checking credulous acceptance wrt. admissibility

Algorithm 1 describes a simple backtracking strategy to assess argument justification. Given an AF F and two argument sets \(S_{in}\) and \(S_{out}\), the algorithm recursively explores potential admissible sets by considering the inclusion or exclusion of individual arguments. It terminates and returns False if \(S_{in}\) is not conflict-free, ensuring the absence of internal attacks. If \(S_{in}\) is already admissible, the algorithm returns True. Otherwise, it selects an argument a from the remaining set of arguments and recursively follows two branches: one including a in \(S_{in}\) and maintaining \(S_{out}\), and another excluding a from \(S_{in}\) and incorporating a into \(S_{out}\). If the search in the first branch succeeds the second branch does not have to be explored. If the search in the first branch fails, we say that the algorithm backtracks and it is required to continue with the second branch. The algorithm returns True if either branch results in an admissible set. The algorithm can determine if an argument a is contained in at least one admissible/preferred/complete extension by calling it with \(\textsf {SEARCH}(F,\{a\},\emptyset )\).

The order in which arguments are processed—i. e. how argument a is determined in line 5 of Algorithm 1—plays a crucial role in the algorithm’s performance, and different heuristics can be employed for this purpose. The algorithm we use in our study specifies the order in a deterministic way. The selected heuristic calculates a confidence value for each argument. Subsequently, these values are arranged in descending order to establish the total ordering.

Example 2

Consider again the AF in Fig. 1, which depicts an AF with the preferred extension \(\{a_0,a_2,a_4\}\), and the task to decide \(\textsf{DC}\) wrt. \(a_2\). Assuming the order determined by a certain heuristic is (\(a_1, a_3, a_4, a_0, a_2\))Footnote 1, the binary tree visualizing the recursive calls needed to solve this task using Algorithm 1 has a depth of 4. In contrast, building the order based on a perfect prediction of each argument’s acceptability yields a depth of 2, thereby enhancing the algorithm’s efficiency.

Note that Algorithm 1 only showcases the general principle of search-based algorithms. Existing search-based solvers [12, 22, 23] are more involved and rely on similar techniques as DPLL- and CDCL-solvers from satisfiability solving [3].

4 Machine Learning-Guided Heuristics

The goal of this paper is to improve the performance of a direct solver by reducing the number of backtracking steps necessary to decide \(\textsf{DC}\) wrt. a given argument.

In order to do so, we employ a machine learning classifier and use the obtained predictions to guide a direct solver. For our research, we decided to predict the overall acceptance status of an argument and use this prediction, along with a confidence measure, to build our heuristic. This heuristic determines the order in which the search algorithm processes the arguments. One might question why we did not employ the classifier to directly predict the optimal order for each argument, thereby eliminating the need for a priority heuristic altogether. However, it is crucial to acknowledge that for each argument, there exist multiple ideal orderings that would effectively guide the solver.

Returning to Example 2, we determined \({a_0, a_2, a_4}\) to be the preferred extension of this AF. However, when deciding \(\textsf{DC}\) with respect to \(a_2\), it does not matter whether we first pass \(a_0\) or \(a_4\) to the algorithm. Another possible approach would be to aim to directly predict admissible sets. The author in [18] describes a similar approach by training a graph neural network to predict which arguments are jointly admissible and then use this information to guide a SAT-based solver. While this approach yielded promising results and provides opportunities for further study, it also requires extensive neural network training, whereas our goal was to investigate whether a lightweight solution could already provide substantial improvements. More information on the training process is provided in Sect. 5.1.

We pass the obtained prediction outcomes as input to a direct solver and use them to develop a heuristic that prioritizes arguments based on their predicted acceptability. Arguments predicted with higher confidence to be accepted wrt. \(\textsf{DC}\) are processed first, while those likely to be rejected are processed last. We compare the results to those obtained using a heuristic that has demonstrated effective performance for \(\textsf{DC}\) in prior research [12].

Following an analysis of some initial experiments, we refine our approach by crafting a heuristic tailored specifically to the query argument. Subsequently, we assess the performance on further datasets.

To determine whether an argument a is acceptable wrt. \(\textsf{DC}\), we need to find a preferred extension that contains a. In contrast, to prove that a is not acceptable wrt. \(\textsf{DC}\), we need to establish that there cannot be a preferred extension containing a. As any conflict with the grounded extension signifies the rejection of a, our strategy in aiming to prove that a is not acceptable wrt. \(\textsf{DC}\) revolves around assuming a, as well as all arguments belonging to the grounded extension, are acceptable wrt. \(\textsf{DC}\) and devising a heuristic prioritizing arguments likely not being accepted. This approach aimed to ensure conflicts happen early on in the justification process, in order to enhance overall performance. While initial experiments showed promising outcomes in AFs with substantial grounded extensions, the efficacy diminished in AFs with an empty grounded extension. Even when restricting the heuristic’s application only to AFs with non-empty grounded extensions, the results failed to significantly surpass those of the standard heuristic. This may be attributed to the fluctuation in prediction accuracy when considering related arguments. As detailed in Sect. 5, although our RF-model is able to classify most arguments correctly, when constructing a heuristic centered on a specific argument, substantial penalties can occur for inaccuracies in predicting related arguments. Additional research is needed to devise a robust heuristic for rejected arguments. Consequently, we exclusively consider accepted arguments in the subsequent sections of this paper.

5 Experimental Analysis

The following section offers an overview of the datasets we utilized, outlines our experimental setup, and presents the results obtained from our experiments.

5.1 Datasets and Setup

To conduct our experiments, we utilized the kwt-train and kwt-test datasets generated using the KwtDungTheoryGeneratorFootnote 2 as described in [16].

Each graph in these datasets consists of 151 arguments, and the training and test set each contain 1000 graphs.

To assess the performance of our heuristic on larger datasets, we employed the KwtDungTheoryGenerator to generate a more extensive dataset called kwt-large. This new dataset comprises 10, 000 graphs designated for training (kwt-large-train) and an additional 1000 graphs reserved for testing (kwt-large-test). The graphs within this dataset span a range of 100 to 300 arguments, with a total of 148, 483 accepted arguments within the kwt-large-test set.

In our research, the primary emphasis lies on enhancing the performance of arguments that are credulously accepted. Accordingly, we sought to employ a graph type that featured a substantial number of accepted arguments for our third dataset. To achieve this, we harnessed the AFBenchGen graph generatorFootnote 3 to create a supplementary set of 10, 000 Barabasi graphs for training (Barabasi-train) and an additional 1, 000 Barabasi graphs for testing (Barabasi-test). These graphs encompassed argument quantities ranging from 100 to 500, resulting in a total of 252, 502 accepted arguments within the Barabasi-test setFootnote 4.

Previous research has suggested that standard machine learning classifiers are useful in predicting the acceptability status of arguments in an AF [13, 16]. In [13] random forest (RF) classifiers trained using a comprehensive feature set provided the best results. This feature set comprised 10 node- and graph-based properties, namely the degree, closeness, Katz [20], and betweenness centrality as well as the number of the strongly connected components (SCC) of the AF, the size of the SCC each argument is part of, the average degree of the AF and whether it is irreflexive, strongly connected or aperiodic.

Building on these results, we trained individual RF classifiers for each dataset. The training and testing procedures were executed using Python, making use of the scikit-learnFootnote 5 and networkxFootnote 6 libraries. For a detailed overview of all three datasets, please refer to Table 1. To quantify the efficacy of our classification results, we use the standard metrics of accuracy, recall (also referred to as true positive rate (TPR)), specificity (also referred to as true negative rate (TNR)), and precision, as well as the Matthews Correlation Coefficient (MCC). We define a true positive (\(\textsf{TP}\)) as an argument in an AF that is accepted wrt. \(\textsf{DC}\) and was correctly classified as such. Accordingly, a true negative (\(\textsf{TN}\)) is a non-accepted argument that is correctly classified as such, and false positives/negatives (\(\textsf{FP}\)/\(\textsf{FN}\)) are the corresponding falsely classified counterparts. Accuracy is defined as \(\frac{\textsf{TP}+ \textsf{TN}}{\textsf{TP}+ \textsf{TN}+ \textsf{FP}+ \textsf{FN}}\), precision as \(\frac{\textsf{TP}}{\textsf{TP}+\textsf{FP}}\), TPR as \(\frac{\textsf{TP}}{\textsf{TP}+\textsf{FN}}\), TNR as \(\frac{\textsf{TN}}{\textsf{TN}+\textsf{FP}}\), and MCC as \(\frac{\textsf{TP}\cdot \textsf{TN}- \textsf{FP}\cdot \textsf{FN}}{\sqrt{(\textsf{TP}+\textsf{FP}) \cdot (\textsf{TP}+ \textsf{FN}) \cdot (\textsf{TN}+\textsf{FP}) \cdot (\textsf{TN}+\textsf{FN})}}\).

Table 1. Overview of the kwt, kwt-large and Barabasi datasets.

We decided on using the Heureka solver [12] due to its implementation of a direct solution approach and its flexibility in incorporating custom heuristics to determine the order of arguments. The objective of the heuristic is to assign a real-number value to each argument through a mapping function. A higher value indicates a higher priority for a particular argument, influencing its processing order in the justification process. Specifically for \(\textsf{DC}\), Heureka employs a standard heuristic that emphasizes arguments within strongly connected components, combining this with a path-based component.

In our experiments, Heureka is executed on each argument within our test sets, capturing both runtime and backtracking steps for individual arguments. The standard heuristic serves as a benchmark for comparing the outcomes of our experiments. To control the overall runtime for each dataset, a timeout of 10 minutes per argument is implemented.

Table 2. Results for classifying the kwt, kwt-large and Barabasi test sets using an RF classifier trained on a total of 10 graph features.

5.2 Initial Experimental Analysis

We begin our experiments by training an RF classifier for each dataset. An overview of the classification metrics is provided in Table 2.

Our initial approach involves simply prioritizing the arguments predicted to be acceptable wrt. \(\textsf{DC}\). The order of argument processing is determined by calculating a score for each argument based on the percentage of trees that favor the assigned label. If an argument is predicted to not be contained in any extension, we prioritize arguments with predictions that are close to the decision boundary.

We evaluate all arguments that are acceptable wrt. \(\textsf{DC}\) in the kwt-test set using both the prediction-based ordering and the standard heuristic.

The results, presented in Table 3, indicate that the simple ordering we employed successfully reduced the need for backtracking in cases with relatively accurate predictions, however, this approach severely penalizes wrong predictions, which led to an overall increase in backtracking steps. Additionally, using the simple ordering heuristic, Heureka was unable to solve three AFs within the 10-minute time limit per argument.

Table 3. Results for classifying \(\textsf{DC}\) arguments in the kwt Dataset using the standard Heureka heuristic as well as a simple prediction-based ordering

To gain deeper insights into the limitations of this approach, we conducted a detailed analysis of the AF that required the highest number of backtracking steps. While applying the prediction-based ordering resulted in a staggering 21, 482, 709 backtracking steps, the standard heuristic was able to resolve this AF without any backtracking.

Upon closer examination, we discovered that out of the 151 arguments in this AF, only 9 specific arguments were responsible for all the backtracking steps. These 9 arguments were the sole accepted arguments that were erroneously predicted as not accepted by our model. Furthermore, all of these arguments belonged to the same extension, and critically, none of them belonged to any other extension. As a result, these crucial arguments, which would be highly valuable for guiding our search algorithm, ended up being processed toward the end of the solving process. Consequently, a straightforward ordering approach proved to be insufficient. To reduce the overall number of required backtracking steps, a more refined heuristic is needed.

To establish whether an argument a is acceptable wrt. \(\textsf{DC}\), we must identify a preferred extension E that contains a. Therefore, we want to prioritize arguments that are most likely part of E. We thus separate our AF into three distinct sets: Arguments that are likely not in E (outExt), arguments that defend a (defenders) and thus have the highest chance to be in E, and arguments that might be in E (possibleExt). Our refined algorithm starts by adding all arguments that are in conflict with a to the outExt set. Likewise, arguments predicted to be outside any extension are categorized within outExt. Subsequently, we then iterate through the remaining arguments, identifying whether arguments act as defenders of a by attacking its attackers or whether they undermine E by targeting arguments likely to be part of E. Arguments that do not fall into the categories of defenders or offenders are placed in the possibleExt set. Once all arguments are processed we determine the heuristic order, making sure, that all defenders are processed first. This is ensured by multiplying the prediction probability of each argument in a set by a dedicated factor for said set. Let the factors used to multiply the prediction confidence values be denoted as \(\textsf{x}\), \(\textsf{y}, \textsf{z}\) for the defenders, possibleExt and outExt sets, respectively. The actual value of the factors is arbitrary, as long as the following conditions hold: \(\textsf{x}> \textsf{y}\) and \(\textsf{z}>1\). For more detailed information, please refer to Algorithm 2. In our experiments we set \(\textsf{x}= 1000\), \(\textsf{y}= 100\), and \(\textsf{z}= 2\).

Algorithm 2
figure b

MLPred Heuristic for accepted Arguments

Running Heureka using this refined approach yielded a significant reduction in backtracking, nearly reaching a \(90\%\) reduction, and enabling Heureka to successfully solve all argumentation AFs within the allocated time, as shown in Table 4. However, when evaluated against the standard heuristic, it is evident that the total number of backtracking steps, though significantly improved, still falls short of matching the performance of the standard heuristic.

Table 4. Results for classifying \(\textsf{DC}\) arguments in the kwt Dataset using the standard Heureka heuristic as well as the MLPred heuristic explained in Algorithm 2.

Within our dataset, all instances of backtracking occur in AFs where the predictive accuracy is not perfect. As we have observed during our in-depth analysis of an individual AF, the quality of predictions can vary not only between AFs but also among arguments within the same AF. Therefore, we require a method to make an informed choice of whether we can rely on the predictions generated by the machine learning model to effectively guide Heureka.

In our algorithm, the defenders set comprises the most critical arguments, as these directly support our query argument a. We operate on the assumption that a larger defenders set implies a more informative prediction for guiding Heureka. We also employ a threshold parameter below, which we opt to use the standard heuristic instead of the prediction. More specifically, this threshold dictates the required size of the defenders set in relation to the possibleExt set. In our experiments, we employed a threshold of 0.35. Re-running Heureka with this threshold produced the results presented in Table 5.

Table 5. Results for classifying the kwt Dataset using the standard Heureka heuristic as well as the MLPred heuristic explained in Algorithm 2 with a threshold of 0.35.

By implementing the threshold to filter out uninformative predictions, we successfully reduced the number of backtracking steps by nearly \(50\%\). In the following section we will evaluate our initial results using larger, more diverse datasets.

5.3 Evaluation and Results

The first evaluation experiment involved running Heureka on the kwt-large dataset using the same prediction quality threshold of 0.35. The MLPred heuristic resulted in a substantial reduction of backtracking steps required for justifying accepted arguments compared to the standard heuristic, as demonstrated in Table 6. Notably, the MLPred heuristic enabled heureka, to successfully solve 994 AFs, whereas the standard heuristic was only able to solve 895 AFs without encountering timeouts.

Table 6. Results for classifying the kwt-large dataset using the standard Heureka heuristic as well as the MLPred heuristic explained in Algorithm 2 with a threshold of 0.35.

We also experienced a drastic decrease in runtime when using the MLPred heuristic. Table 7 shows an overview over the runtime in seconds needed to solve the 895 AFs that both heuristics were able to solve completely. The MLPred heuristic achieved a runtime reduction of 85%. The performance gain achieved by using the MLPred heuristic is also evident, when comparing the runtime for individual arguments. Figure 2a shows the runtime comparison for both heuristics. To limit the overview to instances of a certain difficulty, we only plot arguments, where the amount of backtracking steps exceeds the mean amount of backtracking steps for at least one heuristic. We can clearly see, that on the vast majority of arguments the MLPred heuristic outperformed the standard heuristic.

Table 7. Runtime in seconds for classifying the kwt-large dataset using the standard Heureka heuristic as well as the MLPred heuristic explained in Algorithm 2 with a threshold of 0.35.
Fig. 2.
figure 2

Runtime per argument for arguments with above-average backtracking steps

In order to assess the performance of the prediction heuristic on a different graph type, we ran the same experiments on the Barabasi-test Dataset. Again, the prediction heuristic resulted in a significant reduction in the need for backtracking. In fact, as can be seen in Table 8, the need for backtracking was eliminated almost completely.

We also compared the runtime for both heuristics for the Barabasi dataset. Interestingly, as is evident in Table 9 the standard heuristic overall was the faster choice for the AFs both heuristics could solve. This stems from the fact that Heureka needs more time to parse and build the MLPred heuristic. As the graphs in this dataset in general can be solved much faster than those in the Kwt-large dataset, the decreased runtime for the justification process is not enough to outweigh the overhead added by using the prediction.

However, when comparing the runtime for the individual arguments in Fig. 2b, we can see that for the hardest arguments of this dataset the MLPred heuristic performed better. Combined with the fact, that the MLPred heuristic was able to solve all AFs of this dataset we can still conclude that using a machine learning prediction was beneficial when solving the Barabasi dataset.

6 Limitations

Our study primarily focuses on enhancing the solution runtime for arguments classified as \(\textsf{DC}\). While we successfully utilized machine learning predictions to guide the Heureka solver in justifying rejected arguments in several test cases, our approach did not yield satisfactory results when applied to a larger number of arguments. Additionally, our investigation only considered credulous acceptance under preferred semantics. Furthermore, we limited our analysis to two different graph types, namely kwt and barabasi graphs.

While kwt graphs are intentionally designed to pose a challenge wrt. deciding \(\textsf{DC}\) under preferred semantics, and barabasi graphs are advantageous to our study due to their tendency to contain a large number of accepted arguments, it would be beneficial to assess the efficacy of our approach on other graph types in the future. This assessment should specifically include graphs used as benchmarks in the International Competition on Computational Models of ArgumentationFootnote 7, enabling a direct comparison to other state-of-the-art solvers.

Table 8. Results for classifying the Barabasi dataset using the standard Heureka heuristic as well as the MLPred heuristic explained in Algorithm 2 with a threshold of 0.35.
Table 9. Runtime in seconds for classifying the Barabasi dataset using the standard Heureka heuristic as well as the MLPred heuristic explained in Algorithm 2 with a threshold of 0.35.

We chose a lightweight approach, employing a standard random forest classifier trained on different graph properties. Although the classification results were reasonably good, more advanced techniques such as neural networks have demonstrated even better results and could, therefore, prove beneficial in our pursuit to improve the runtime of justification algorithms.

7 Conclusion

The goal of our research was to improve the runtime of a search-based solver, by reducing the backtracking steps needed to justify whether an argument is \(\textsf{DC}\).

Our study revealed that using machine learning predictions to assist a search-based solver leads to notable advantages in minimizing backtracking steps and improving runtime in decision-making processes, specifically in the context of argument acceptance. The integration of machine learning resulted in a significant reduction of backtracking steps, achieving a minimum reduction of 96%. Across all datasets examined, the overall runtime could be decreased by up to 85%. Furthermore, our approach was able to enhance the solvability of argumentation frameworks within a specified time constraint.

Further research opportunities could involve combining the classifier and the solver into a standalone application, eliminating the necessity to provide the solver with external predictions. However, given that the prediction quality of the RF classifier depends on the similarity between training and testing data, exploring alternative classifiers becomes imperative. Existing studies propose that employing graph neural networks holds promise for achieving robust prediction results.

Notably, our research did not yield significant improvements for rejected arguments. Subsequent investigations should look into strategies to effectively apply predictions to rejected arguments.