Keywords

1 Introduction

Dating back to the times of Aristotle, causality as a concept is intimately linked with human reasoning and the formation of arguments (Evans, 1959; Falcon, 2006). When trying to find the truth over a topic by exchanging arguments, we rely on causal relations to ground our claims within the realm of observed and already agreed-upon knowledge (Hume, 1896). In summary, causality is the key factor that lets us distinguish between mere coincidences and true relations of cause and effect (Mackie, 1980; Aldrich, 1995).

While day-to-day arguments might (or might rather not) involve explicit notions of causality, we assume in this paper that certain kinds of mechanisms dictate the unfolding of events in our everyday lives. While, in principle, arbitrary events could alter the unfolding of things, we assume that there exists a ‘natural’ –that is, an unintervened– unfolding of events. When trying to provide arguments about possible alternative outcomes, one might try to come up with a possible ‘counterfactual’ unfolding of events that only requires small deviations from the, otherwise, natural unfolding. This heuristic is based on the assumption that explanations adhere to previous observations and past experiences. In this paper, we argue that arguments should stay close to previous experience and only be abandoned in the light of new evidence. Such situations usually only happen when explicit information about such interventions taking place is obtained or observations deviate from expectations to the extent that assuming interventions to be the underlying cause is inevitable.

In the following sections, we discuss the use of different counterfactual explanation methods within graphical causal models. More formally, we utilize the Pearlian notion of causality (Pearl, 2009) to reason about underlying causal relations. While we derive our reasoning method with the formalism of Pearlian causality, we want to point out that several methods exist to transform arguments to causal Bayesian networks, and vice versa to extract arguments from those (Bex et al., 2016; Timmer et al., 2017; Wieten et al., 2019).

Contributions. In the following we briefly sketch a possible application of our idea in a potential case of a court hearing. We discuss the benefits and shortcomings of classical interventional and backtracking counterfactuals. Both approaches aim to generate arguments for hypothetical and/or counterfactual scenarios. We propose an algorithm to infer the most plausible explanation from a natural unfolding of a system and fall back to interventional explanations when needed. Lastly, we propose the use of infinitesimal probabilities in causal models as a way of comparing explanations across multiple interventional distributions.

1.1 Introductory Example

To the best of our knowledge, counterfactual causal reasoning has not yet been applied to the field of formal argumentation (as, for example, summarized by Baroni et al. (2011)). Such approaches might be particularly useful to a confined set of settings, where one tries to argue over hypothetical –counterfactual– scenarios. We will now present a hypothetical applied example to better motivate the assumptions made during the following sections.

Consider the hypothetical scenario of a debate during a court hearing on whether or not some store employee could have helped an injured customer. While the fact that the employee did not help is undisputed, a defense attorney might try to argue that all attempts to provide such help would have also been bound to fail. Therefore, all arguments remain in reasoning about non-observable counterfactual outcomes. In such a scenario, a successful defense would naturally try to derive zero probability for all possible positive (in terms of successfully helping the customer) outcomes. The opposing prosecutor would try to come up with feasible counterarguments. A possible line of argument could go as follows:

Argument: Even if the employee had been willing to help, no medicine was available. Attack: Assuming that medicine would have become available, the employee should have started with the emergency procedure.

Argument: The employee did not receive proper training to start the procedure. Attack: Proper training was offered to all employees. And so on...

Every argument tries to reduce the probability of a successful help outcome to zero. In such cases, the only remaining attack is to assume that some latent factor (e.g. presence of medicine, proper training, ...) could have been set to another value than the one that is claimed. Given the benefit of the doubt, every such assumption required to accuse the employee weakens the indictment and lowers the chance of conviction. The total number of such necessary assumptions is likely to influence the final court ruling. In the following, we will capture this notion via a preorder on argument preference in Sect. 3.2.

2 Preliminaries and Related work

In general, we write indexed sets of variables in bold upper-case \(\textbf{X}\) and their values in lower-case \(\textbf{x}\). Single variables and their values are written in normal style (X, x). Specific elements of a tuple are indicated by a subscript index \(X_i\). Probability distributions of a variable X or a tuple \(\textbf{X}\) of variables are denoted by \({{\,\textrm{P}\,}}_X\) and \({{\,\textrm{P}\,}}_{\textbf{X}}\) respectively.

Structural Causal Models. Structural Causal Models (SCM) provide a framework to formalize a notion of causality via graphical models (Pearl, 2009). They can be expressed as structural equation models without affecting expressiveness (Rubenstein et al., 2017). We adopt a slightly modified definition of SCM modeling an explicit set of allowed interventions, similar to earlier works of Rubenstein et al. (2017); Beckers and Halpern (2019); Willig et al. (2023).

Definition 1

A structural causal model is a tuple \({{\,\mathrm{\mathcal {M}}\,}}= ({{\,\mathrm{\textbf{V}}\,}}, {{\,\mathrm{\textbf{U}}\,}}, {{\,\mathrm{\textbf{F}}\,}}, {{\,\mathrm{\mathcal {I}}\,}}, {{\,\textrm{P}\,}}_{{{\,\mathrm{\textbf{U}}\,}}})\) forming a directed acyclic graph \({{\,\mathrm{\mathcal {G}}\,}}\) over the indexed set of variables \({{\,\mathrm{\textbf{X}}\,}}= \{X_1,\dots , X_K\}\) taking values in \(\pmb {\mathcal {X}}=\prod _{k\in \{1 \dots K\}}\mathcal {X}_k\) subject to a strict partial order \(<_{{{\,\mathrm{\textbf{X}}\,}}}\) over \({{\,\mathrm{\textbf{X}}\,}}\), where

  • \({{\,\mathrm{\textbf{V}}\,}}= \{ X_1, \dots , X_N\}\subseteq {{\,\mathrm{\textbf{X}}\,}}, N \le K\) is the indexed set of endogenous variables.

  • \({{\,\mathrm{\textbf{U}}\,}}= {{\,\mathrm{\textbf{X}}\,}}\setminus {{\,\mathrm{\textbf{V}}\,}}= \{ X_{N+1}, \dots , X_K \}\) is the indexed set of exogenous variables.

  • \({{\,\mathrm{\textbf{F}}\,}}\) is the indexed set of deterministic structural equations, \(V_i := f_i({{\,\mathrm{\textbf{X}}\,}}')\), where the parents are \({{\,\mathrm{\textbf{X}}\,}}' \subseteq \{ X_j \in {{\,\mathrm{\textbf{X}}\,}}| X_j <_{{{\,\mathrm{\textbf{X}}\,}}} V_i\}\).

  • \(\mathcal {I} \subseteq \{\{I_{i,d_i}\ | i \in \textbf{i}, d_i \in \textbf{d}\}_{\textbf{i} \subseteq \{1\dots N\}}\}_{\textbf{d} \in {\pmb {\mathcal {J}}}}\) where \(\pmb {\mathcal {J}}\) is the set of possible (generally unknown) joint distributions \(\textbf{d}\) on \(\pmb {\mathcal {X}}\), \(I_{i,d_i}\) indicates an intervention \({{\,\mathrm{\textit{do}}\,}}(X_i \sim d_i)\), where the value of \(X_i\) is sampled from the i-th marginal distribution of \(\textbf{d}\). We write arbitrary sets of interventions on \({{\,\mathrm{\textbf{X}}\,}}' \subseteq {{\,\mathrm{\textbf{X}}\,}}\) as \({{\,\mathrm{\textbf{I}}\,}}_{{{\,\mathrm{\textbf{X}}\,}}'} \in {{\,\mathrm{\mathcal {I}}\,}}\).

  • \({{\,\textrm{P}\,}}_{{{\,\mathrm{\textbf{U}}\,}}}\) is the probability distribution over \({{\,\mathrm{\textbf{U}}\,}}\).

By construction, at most one intervention on any specific variable is to be included in any intervention set \({{\,\mathrm{\textbf{I}}\,}}\in {{\,\mathrm{\mathcal {I}}\,}}\). When \(\pmb {\mathcal {J}}\) is defined to equal \(\delta (\pmb {\mathcal {X}})\) (with \(\delta (\pmb {\mathcal {X}})\) being the set of all possible Dirac distributions over \(\pmb {\mathcal {X}}\)), \({{\,\mathrm{\mathcal {I}}\,}}\) models sets of atomic interventions. An atomic intervention on a single variable \({{\,\mathrm{\textit{do}}\,}}(X_i \sim \delta (x'_i))\) places all probability mass on a single value \(x'_i\). Consequently, the unintervened \(f_i\) can be replaced by the constant assignment \(X_i := x'_i\) and we write \({{\,\mathrm{\textit{do}}\,}}(X_i = x'_i)\), and \(I_{i,x_i}\), respectively. Every \({{\,\mathrm{\mathcal {M}}\,}}\) entails a DAG structure \({{\,\mathrm{\mathcal {G}}\,}}= ({{\,\mathrm{\textbf{X}}\,}}, \mathcal {E})\) consisting of vertices \({{\,\mathrm{\textbf{X}}\,}}\) and edges \(\mathcal {E}\), where a directed edge from \(X_j\) to \(X_i\) exists if \(\exists x_0,x_1 \in \mathcal {X}_j. f_i(\textbf{x}', x_0) \ne f_i(\textbf{x}', x_1)\). For every variable \(X_i\) we define \({{\,\textrm{ch}\,}}(X_i), {{\,\textrm{pa}\,}}(X_i)\) and \({{\,\textrm{an}\,}}(X_i)\) as the set of direct children, direct parents and ancestors respectively, according to \({{\,\mathrm{\mathcal {G}}\,}}\).Footnote 1 Every \({{\,\mathrm{\mathcal {M}}\,}}\) entails an observational distribution \({{\,\textrm{P}\,}}_{{{\,\mathrm{\mathcal {M}}\,}}}\)Footnote 2 by pushing forward \({{\,\textrm{P}\,}}_{{{\,\mathrm{\textbf{U}}\,}}}\) through \({{\,\mathrm{\textbf{F}}\,}}\). Intervention \({{\,\mathrm{\textit{do}}\,}}(X_i \sim d_i)\) replace \(f_i\) by a function sampling from \(d_i\). As a consequence, \({{\,\mathrm{\mathcal {M}}\,}}\) might entail infinitely many intervened distributions \({{\,\textrm{P}\,}}^{{{\,\mathrm{\textbf{I}}\,}}}_{{{\,\mathrm{\mathcal {M}}\,}}}\), generally preventing us from simultaneously modeling all possible scenarios that might arise during an argument (see Sect. 3.3).

3 Backtracking in Causal Models

In this section, we will briefly review inference for classical ‘interventional’ counterfactuals of Pearl (2009) and compare them to ‘backtracking’ counterfactuals of Von Kügelgen et al. (2023). We will then present a scenario where backtracking counterfactuals fail to explain the given evidence and propose an iterative method to remedy the situation. Since the following observational and counterfactual values might be inferred over the same set of variables we denote the corresponding counterfactual quantities with .

Interventional Counterfactuals. We write \({{\,\textrm{P}\,}}_{{{\,\mathrm{\mathcal {M}}\,}}}({{\,\mathrm{\textbf{Y}}\,}}_{\textbf{x}^*} = {{\,\mathrm{\textbf{y}}\,}}^*|{{\,\mathrm{\textbf{e}}\,}})\) to express the counterfactual question: “What would be the probability of some \({{\,\mathrm{\textbf{Y}}\,}}\subseteq {{\,\mathrm{\textbf{V}}\,}}\) taking values \({{\,\mathrm{\textbf{y}}\,}}^*\) given some observations (or evidence) \({{\,\mathrm{\textbf{e}}\,}}\), had variables \({{\,\mathrm{\textbf{X}}\,}}^* \in {{\,\mathrm{\textbf{X}}\,}}\) taken values \(\textbf{x}^*\)”. Given a tuple \({{\,\mathrm{\textbf{e}}\,}}, \textbf{x}^*\), classical counterfactual inference is performed in three steps (Pearl, 2009):

Step 1 (abduction): Infer the most probable configuration \({{\,\mathrm{\textbf{u}}\,}}\) given evidence \({{\,\mathrm{\textbf{e}}\,}}\) by maximizing \({{\,\textrm{P}\,}}({{\,\mathrm{\textbf{u}}\,}}|{{\,\mathrm{\textbf{e}}\,}})\).

Step 2 (action): Act on the model \({{\,\mathrm{\mathcal {M}}\,}}\) by applying interventions \({{\,\mathrm{\textit{do}}\,}}(\textbf{X}^* = \textbf{x}^*)\). Such that \({{\,\mathrm{\textbf{F}}\,}}' = \{f_i \in {{\,\mathrm{\textbf{F}}\,}}| \not \exists I_{j,v_j} \in {{\,\mathrm{\textbf{I}}\,}}. i = j\} \cup \{ X_i := x^*_i\}_{\{x^*_i \in \textbf{x}^*\}}\) and \({{\,\mathrm{\mathcal {M}}\,}}^{{{\,\mathrm{\textbf{I}}\,}}} = ({{\,\mathrm{\textbf{V}}\,}}, {{\,\mathrm{\textbf{U}}\,}}, {{\,\mathrm{\textbf{F}}\,}}', \emptyset , {{\,\textrm{P}\,}}_{{{\,\mathrm{\textbf{U}}\,}}})\).

Step 3 (prediction): Compute \({{\,\textrm{P}\,}}^{{{\,\mathrm{\textbf{I}}\,}}}_{{{\,\mathrm{\mathcal {M}}\,}}}({{\,\mathrm{\textbf{Y}}\,}}= {{\,\mathrm{\textbf{y}}\,}}^*|{{\,\mathrm{\textbf{u}}\,}})\).

The most likely counterfactual configuration of variables \({{\,\mathrm{\textbf{Y}}\,}}\) can then be obtained by searching for an \({{\,\mathrm{\textbf{y}}\,}}^*\) that maximizes \({{\,\textrm{P}\,}}^{{{\,\mathrm{\textbf{I}}\,}}}_{{{\,\mathrm{\mathcal {M}}\,}}}({{\,\mathrm{\textbf{Y}}\,}}= {{\,\mathrm{\textbf{y}}\,}}^*|{{\,\mathrm{\textbf{u}}\,}})\) in the third step.

Backtracking Counterfactuals. By interpreting the counterfactual quantities \({{\,\mathrm{\textbf{X}}\,}}_i = \textbf{x}_i^*\) as interventions, interventional counterfactuals ‘detach’ the affected variables from the inferred \({{\,\mathrm{\textbf{u}}\,}}\)’s by overwriting their structural equations. Von Kügelgen et al. (2023) try to embed counterfactual values more naturally into the inference framework by fixing \(\textbf{x}^*\) but backtracking without interventions to a counterfactual set \({{\,\mathrm{\textbf{u}}\,}}^*\) which then entails \(\textbf{x}^*\), but might differ from the \({{\,\mathrm{\textbf{u}}\,}}\) inferred via observations \({{\,\mathrm{\textbf{e}}\,}}\). Backtracking counterfactuals preserve the unaltered graph structure by trading it for the induction of a new \({{\,\mathrm{\textbf{u}}\,}}^*\). Throughout the inference of \({{\,\mathrm{\textbf{y}}\,}}^*\) the values of \({{\,\mathrm{\textbf{u}}\,}}, {{\,\mathrm{\textbf{u}}\,}}^*\) should be kept as close as possible (with regard to some similarity measure) such that \({{\,\textrm{P}\,}}({{\,\mathrm{\textbf{U}}\,}}, {{\,\mathrm{\textbf{U}}\,}}^*)\) is maximized.

Comparison. When comparing both approaches, one sees that either the structural equations of \({{\,\mathrm{\mathcal {M}}\,}}\) are altered or a new set of exogenous variables is inferred in order to explain a different outcome. As we want to minimize the usage of interventions and apply them only in cases where no other options are viable, we will consider backtracking counterfactuals as the default technique to infer explanations. Stated more simply: ‘If we can explain a counterfactual scenario without the use of external interventions we will do so.’ Depending on the specific metric connecting \({{\,\mathrm{\textbf{u}}\,}}\) and \({{\,\mathrm{\textbf{u}}\,}}^*\), backtracking counterfactuals might infer a vector \({{\,\mathrm{\textbf{u}}\,}}^*\) that could be inconsistent with our evidence \({{\,\mathrm{\textbf{e}}\,}}\). For our goal of obtaining arguments that are coherent with given observations we require all \({{\,\mathrm{\textbf{v}}\,}}^*\) to take the same values as their corresponding counterparts \({{\,\mathrm{\textbf{v}}\,}}\) constrained via evidence \({{\,\mathrm{\textbf{e}}\,}}\). Otherwise, one could always come up with explanations by choosing an arbitrary similarity metric and simply disregard the observed evidence. We will now discuss a scenario where backtracking counterfactuals are unable to explain certain situations and interventions need to be applied.

3.1 When Backtracking is not Enough

Under certain conditions, backtracking will always be able to infer a plausible configuration \({{\,\mathrm{\textbf{u}}\,}}\) given some observation \(\textbf{e}\). Specifically, this is the case whenever the distribution has full support over \({{\,\mathrm{\textbf{X}}\,}}\), implying that for any \(\textbf{x} \in {{\,\mathrm{\textbf{X}}\,}}\) the quantity \(P(\textbf{x})\) is non-zero. Thus, we can always find some \(\textbf{u} \in {{\,\mathrm{\textbf{U}}\,}}\) for any \(\textbf{e}\) such that \({{\,\textrm{P}\,}}_{{{\,\mathrm{\mathcal {M}}\,}}}(\textbf{u}) \ne 0\). However, there exist a multitude of situations where the SCM won’t have full support over the joint domain. For better intuition we reiterate the example given by Von Kügelgen et al. (2023, Remark 4): even for the most simple structural equation \(X_0 := U_0; X_1:=X_0\), with \(U_0, X_0, X_1\) being Boolean, we are unable to explain the observation \((X_0 = \texttt {True}, X_1 = \texttt {False})\) via any value of \(U_0\). While this example might seem to be oversimplified, it demonstrates the general problem of the backtracking approach: Whenever there exists a deterministic relation between variables (e.g. \(X_1 := X_0\)) we are unable to independently set both variables to distinct values using \(U_0\). In such settings, we can always choose some \(x_1 \ne x_0\) as evidence and end up with a situation that can not be explained via backtracking.

3.2 Iterative Backup

In the aforementioned case, there is no other choice than resorting back to deploying interventions. However, the number of deployed interventions on the system should be kept minimal. We propose a simple algorithm (c.f. Fig. 1) that is gradually backing up to explanations with higher numbers of interventions in the case that a natural –interventionless– explanation can not be derived from evidence.

Fig. 1.
figure 1

IterativeBacktrack Algorithm. The algorithm searches through different classes of explanations, starting at the unintervened \(\mathscr {I}_0\). In case of not obtaining any suitable explanations the algorithm gradually backs up to \(\mathscr {I}_{i>0}\) to search for explanations with higher numbers of interventions. Within the procedure standard backtracking (Backtrack) is performed. However, after the first unintervened iteration it is always applied over an already intervened graph \({{\,\mathrm{\mathcal {M}}\,}}^{{{\,\mathrm{\textbf{I}}\,}}'_{\bar{{\,\mathrm{\textbf{V}}\,}}}}\).

Order of Preference. To express a preference for a natural unfolding of a system we establish an ordering that prefers explanations with no or fewer interventions over those requiring larger numbers of interventions. We express our preference with the following preorder:

$$\begin{aligned} ({{\,\mathrm{\textbf{x}}\,}}^*, {{\,\mathrm{\textbf{I}}\,}}_{{{\,\mathrm{\textbf{V}}\,}}}) < ({{\,\mathrm{\textbf{x}}\,}}^*, {{\,\mathrm{\textbf{I}}\,}}_{{{\,\mathrm{\textbf{V}}\,}}\setminus V_i}) < \dots < ({{\,\mathrm{\textbf{x}}\,}}^*, \{I_{V_i}, I_{V_j}\})_{i \ne j} < ({{\,\mathrm{\textbf{x}}\,}}^*, \{I_{V_i}\}) < ({{\,\mathrm{\textbf{x}}\,}}^*, \emptyset ) \end{aligned}$$
(1)

where each explanation \(({{\,\mathrm{\textbf{x}}\,}}^*, {{\,\mathrm{\textbf{I}}\,}}_{\bar{{\,\mathrm{\textbf{V}}\,}}})\) with \(\bar{{\,\mathrm{\textbf{V}}\,}}\subseteq {{\,\mathrm{\textbf{V}}\,}}\) consists of a counterfactual variable assignment \({{\,\mathrm{\textbf{x}}\,}}^*\) and a set of interventions \({{\,\mathrm{\textbf{I}}\,}}_{\bar{{\,\mathrm{\textbf{V}}\,}}}\) that leads to a non-zero probability \(P^{{{\,\mathrm{\textbf{I}}\,}}}_{{{\,\mathrm{\mathcal {M}}\,}}}({{\,\mathrm{\textbf{x}}\,}}^*)\). By this preorder, one explanation \(({{\,\mathrm{\textbf{x}}\,}}^*, {{\,\mathrm{\textbf{I}}\,}}_{\bar{{\,\mathrm{\textbf{V}}\,}}})\) is preferred over another \(({{\,\mathrm{\textbf{x}}\,}}^*, {{\,\mathrm{\textbf{I}}\,}}'_{\bar{{\,\mathrm{\textbf{V}}\,}}})\) whenever \(|{{\,\mathrm{\textbf{I}}\,}}_{\bar{{\,\mathrm{\textbf{V}}\,}}}| < |{{\,\mathrm{\textbf{I}}\,}}'_{\bar{{\,\mathrm{\textbf{V}}\,}}}|\). As an immediate consequence, this ordering groups together explanations based on the number of interventions. We define explanation classes \(\mathscr {I}_i\) with \(i \in \mathbb {N}_0\) and define \(({{\,\mathrm{\textbf{x}}\,}}^*, {{\,\mathrm{\textbf{I}}\,}}_{\bar{{\,\mathrm{\textbf{V}}\,}}}) \in \mathscr {I}_i\) iff \(|{{\,\mathrm{\textbf{I}}\,}}_{\bar{{\,\mathrm{\textbf{V}}\,}}}| = i\). With a slight inaccuracy in expression we also refer to any resulting \({{\,\textrm{P}\,}}_{{{\,\mathrm{\mathcal {M}}\,}}}^{{{\,\mathrm{\textbf{I}}\,}}_{\bar{{\,\mathrm{\textbf{V}}\,}}}}({{\,\mathrm{\textbf{x}}\,}}^*)\) as belonging to \(\mathscr {I}_i\) whenever the associated \(({{\,\mathrm{\textbf{x}}\,}}^*, {{\,\mathrm{\textbf{I}}\,}}_{\bar{{\,\mathrm{\textbf{V}}\,}}}) \in \mathscr {I}_i\) holds. Algorithm 1 searches for the best explanation within each class \(\mathscr {I}_i\) by finding the most probable configuration \({{\,\mathrm{\textbf{x}}\,}}^*\) via classical backtracking counterfactuals while maximizing probability by varying \({{\,\mathrm{\textbf{I}}\,}}\) under the constraint of \(|{{\,\mathrm{\textbf{I}}\,}}| = i\). Naturally, the algorithm starts at \({{\,\mathrm{\textbf{I}}\,}}= \emptyset \) and gradually searches through higher classes of \(\mathscr {I}_i\). Whenever there exists no viable backtracking explanation for a given set of interventions the initial assumption of \({{\,\textrm{P}\,}}_{{{\,\mathrm{\mathcal {M}}\,}}}^{{{\,\mathrm{\textbf{I}}\,}}_{\bar{{\,\mathrm{\textbf{V}}\,}}}}({{\,\mathrm{\textbf{x}}\,}}^*) \ne 0\) for any \({{\,\mathrm{\textbf{V}}\,}}\) of the backtracking algorithm is violated. For such cases we assume that the \(\mathop {\textrm{argmax}}\limits \) returns an arbitrary \(({{\,\mathrm{\textbf{x}}\,}}^*, {{\,\mathrm{\textbf{I}}\,}}_{\bar{{\,\mathrm{\textbf{V}}\,}}})\) whose probability \({{\,\textrm{P}\,}}_{{{\,\mathrm{\mathcal {M}}\,}}}^{{{\,\mathrm{\textbf{I}}\,}}_{\bar{{\,\mathrm{\textbf{V}}\,}}}}({{\,\mathrm{\textbf{x}}\,}}^*)\) evaluates to zero.

Trivial Explanation. There always exists at least one explanation with non-zero probability (terminating the algorithm). Assuming that all variables have at least one \(x_i \in \mathcal {X}_i\) with \(P(X_i=x_i) > 0\). Then there exists at least one configuration that is consistent with the given evidence, that is \(({{\,\mathrm{\textbf{x}}\,}}^*, \{\textbf{E}:={{\,\mathrm{\textbf{e}}\,}}; {{\,\mathrm{\textbf{V}}\,}}\setminus \textbf{E} := {\textbf {v}}^* \setminus {{\,\mathrm{\textbf{e}}\,}}\})\) with \({{\,\mathrm{\textbf{x}}\,}}^*\) arbitrary and every \(v^* \in {{\,\mathrm{\textbf{V}}\,}}\) such that \(P({{\,\mathrm{\textbf{V}}\,}}=v^*) > 0\). This explanation intervenes on all variables that are not set by the evidence, fully factorizing the SCM into its trivial decomposition \(P({{\,\mathrm{\textbf{V}}\,}}) = \prod _{V_i \in {{\,\mathrm{\textbf{V}}\,}}}{{\,\textrm{P}\,}}(V_i) = 1\) such that any variable is either determined by evidence or intervention with a probability of one.

3.3 Default Logic

Most of today’s causal literature operates under the assumption that the set of variables is fixed upon performing causal inference. We find this assumption particularly difficult in the field of argumentation, where novel arguments might be added dynamically by different parties in order to support their positions. It is possible to model hard interventions via instrumental variables as laid out by Von Kügelgen et al. (2023, Appendix A). A downside of instrumental variables is that these auxiliary variables induce additional complexity to the SCM. While it is easy to attach instrumental variables to any of the original variables, it might be challenging to consider all possible interventions that can be performed on the real-world model under consideration. For these reasons, we would like to incorporate interventions only when required. One possible solution to this problem is the adoption of concepts of default logic. In essence, interventions are disregarded during ‘normal operation’ and only considered when mandatory. Pearl (1988) and Bochman (2023) discuss possible approaches with regard to causality. Still, it is difficult to quantify and compare probabilities taken from an ever-adapting SCM. Probability values returned from a graph under intervention \({{\,\mathrm{\textbf{I}}\,}}\) are no longer comparable with regard to some other intervention \({{\,\mathrm{\textbf{I}}\,}}'\) as the preconditions (specifically the number of interventions) changed, resulting in different underlying distributions.

3.4 Integration of Hyperreals

Modeling all possible scenarios as explicit variables comes with the problem, that we are usually unable to anticipate every possible arbitrary intervention that might occur in the future. To tackle this kind of problem, we induce a search order for Algorithm 1 that guarantees it to stop at the minimal \(\mathscr {I}_i\). No matter which explanation \(({{\,\mathrm{\textbf{x}}\,}}^*, {{\,\mathrm{\textbf{I}}\,}}_{\bar{{\,\mathrm{\textbf{V}}\,}}})\) is returned, we guarantee that there exists no other explanation \(({{\,\mathrm{\textbf{x}}\,}}'^*, {{\,\mathrm{\textbf{I}}\,}}'_{\bar{{\,\mathrm{\textbf{V}}\,}}})\) with fewer interventions such that we adhere to the total ordering of Eq. 1. The main difficulty of this problem, however, stems from the inability to encode our preferred order of \(\mathscr {I}_i\) within the probabilities \({{\,\textrm{P}\,}}_{{{\,\mathrm{\mathcal {M}}\,}}}^{{{\,\mathrm{\textbf{I}}\,}}_{X}}({{\,\mathrm{\textbf{x}}\,}}^*)\) itself. For two explanations \(({{\,\mathrm{\textbf{x}}\,}}^*, {{\,\mathrm{\textbf{I}}\,}}_{\bar{{\,\mathrm{\textbf{V}}\,}}}), ({{\,\mathrm{\textbf{x}}\,}}'^*, {{\,\mathrm{\textbf{I}}\,}}'_{\bar{{\,\mathrm{\textbf{V}}\,}}})\) of different \(\mathscr {I}_i\) with non-zero support, \({{\,\textrm{P}\,}}_{{{\,\mathrm{\mathcal {M}}\,}}}^{{{\,\mathrm{\textbf{I}}\,}}_{X}}({{\,\mathrm{\textbf{x}}\,}}^*)\) and \({{\,\textrm{P}\,}}_{{{\,\mathrm{\mathcal {M}}\,}}}^{{{\,\mathrm{\textbf{I}}\,}}'_{X}}({{\,\mathrm{\textbf{x}}\,}}'^*)\) might be ordered arbitrarily. In this regard, we propose a small trick of introducing infinitesimal quantities \(\varepsilon \) within the probability estimate of our SCM to make quantities comparable across preference classes.

Hyperreal Numbers (informal). We utilize the concept of hyperreal numbers as an extension of the real numbers \(\mathbb {R}\) (Robinson, 2016). For this, we define an infinitesimal unit \(\varepsilon \) with \(\varepsilon < r\) for all \(r \in \mathbb {R}\). Additionally, we make use of the standard part function which maps any to its nearest real-valued representation.

Within our SCM we define an auxiliary variable \(X_{\#{{\,\mathrm{\textbf{I}}\,}}} \in \{0..N\}\) that counts the number of active interventions. Upon intervening on the SCM we set \(X_{\#{{\,\mathrm{\textbf{I}}\,}}} := |{{\,\mathrm{\textbf{I}}\,}}|\) with corresponding probability \({{\,\textrm{P}\,}}(X_{\#{{\,\mathrm{\textbf{I}}\,}}} = n) = \varepsilon ^{n}\). The probability assignment forms a valid distribution, as for \({{\,\textrm{P}\,}}(X_{\#{{\,\mathrm{\textbf{I}}\,}}} = 0) = \varepsilon ^0 = 1\) and for any \(n \ne 0, {{\,\textrm{P}\,}}(X_{\#{{\,\mathrm{\textbf{I}}\,}}}) = \text {st}(\varepsilon ^{n}) = 0\). The terms are additive and normalized (\(\sum ^N_{n=0}\varepsilon ^n = \varepsilon ^0 + \sum ^N_{n=1}\varepsilon ^n = 1 + \sum ^N_{n=1} 0 = 1\)). In essence, we introduce an auxiliary variable within our model that gets added to the joint distribution of our SCM:

$$\begin{aligned} {{\,\textrm{P}\,}}^{{{\,\mathrm{\textbf{I}}\,}}}_{{{\,\mathrm{\mathcal {M}}\,}}}({{\,\mathrm{\textbf{X}}\,}}) = \left[ \prod \nolimits _{X_i \in \{ {{\,\mathrm{\textbf{X}}\,}}\setminus {{\,\mathrm{\textbf{I}}\,}}\}} {{\,\textrm{P}\,}}(X_i | {{\,\textrm{pa}\,}}(X_i)) \right] \cdot {{\,\textrm{P}\,}}(X_{\#{{\,\mathrm{\textbf{I}}\,}}}) \end{aligned}$$
(2)

With increasing numbers of interventions \({{\,\textrm{P}\,}}(X_{\#{{\,\mathrm{\textbf{I}}\,}}})\) takes probabilities of higher-order infinitesimal values \(\varepsilon ^{1}, \varepsilon ^{2}, \dots \) which are totally ordered, irregardless of the remaining \({{\,\textrm{P}\,}}_{{{\,\mathrm{\mathcal {M}}\,}}}({{\,\mathrm{\textbf{X}}\,}}\setminus X_{\#{{\,\mathrm{\textbf{I}}\,}}})\)Footnote 3. Taking the standard part of these probabilities \(\text {st}({{\,\textrm{P}\,}}^{{{\,\mathrm{\textbf{I}}\,}}}_{{{\,\mathrm{\mathcal {M}}\,}}}({{\,\mathrm{\textbf{X}}\,}})) = 0\) results in zero probability which underlines our intuition of considering interventions as external entities within the natural unfolding of our system. Importantly, in a scenario with no intervention present \({{\,\textrm{P}\,}}(X_{\#{{\,\mathrm{\textbf{I}}\,}}})\) evaluates to \(\varepsilon ^{0} = 1\), thus preserving probabilities in the unintervened case.

4 Discussion

In this paper, we discussed the use of backtracking counterfactuals as well as classical interventional counterfactuals for deriving explanations. In the light of obtaining arguments from structural causal models, we proposed to choose the most ‘natural’ explanation. That is, choosing explanations requiring the least number of interventions. Backtracking counterfactuals seem to be the more natural choice for supporting arguments as we do inherently try to avoid explaining counterfactual outcomes via arbitrary external interventions. On the other side, interventional counterfactuals can explain cases where backtracking reaches its limits. For this reason, we proposed a basic algorithm for gradually backing up from the interventionless setting towards explanations that involve more and more changes on the graph. Through this process we choose the explanation that requires the least number of interventions, therefore ‘identifying the type of counterfactual at hand’. Eventually, we made a first attempt of inducing our preorder not only on the algorithmic level but also to encode it into the probabilities derived from the SCM itself by utilizing infinitesimal quantities.

Limitations. Our preference order currently only considers the number of interventions needed to explain the observed evidence. While the number of interventions might act as a sensible proxy for measuring the ‘reasonability’ of applied interventions, we expect that extended investigations on the impact and/or plausibility of different interventions should be done in the future.