figure a
figure b

1 Introduction

Recent advances in (often ML-based) artificial intelligence have led to a proliferation of algorithmic decision making (ADM) agents. The risk that these agents may cause harm – and the many demonstrated examples of them already doing so, ranging across numerous domains [3, 8, 19, 30] – has led to a significant demand for technologies to enable their responsible use. In this work, we present \(\textsf{soid}\), a tool based on Judson et al.’s method [16] to account for software systems using computational tools from the fields of formal methods and automated reasoning. The \(\textsf{soid}\) tool is primarily oriented towards supporting legal reasoning and analysis, in order to better understand the ultimate purpose of an agent’s decision making – as is often relied upon by various bodies of law.

Fig. 1.
figure 1

Architecture of the \(\textsf{soid}\) tool.

In particular, rather than traditional verification methods which aim towards proving a specific program property, \(\textsf{soid}\) instead aims to ‘put the agent on the stand’. The design of \(\textsf{soid}\) enables factual and counterfactual querying – underlying a finding of fact – in support of human-centered assessment of the ‘why’ of the agent’s decision making. Such an assessment can then in turn justify holding responsible an answerable owner or operator, like a person or company. We describe the functioning of the \(\textsf{soid}\) tool itself as well as a pair of examples of its use on simulated harms. We also describe the \(\textsf{soid}\hbox {-}\!\textsf{gui}\), a domain-specific interface for \(\textsf{soid}\) applied to autonomous vehicles, allowing for adaptive and interpretable analysis of driving decisions without requiring extensive programming skills or familiarity with formal logical reasoning.

The basic flow of \(\textsf{soid}\), depicted in Fig. 1, is adaptive and requires a human in the loop. The human investigator – likely a practitioner such as a lawyer or regulator supported as necessary by engineers – uses \(\textsf{soid}\) to better understand the decision making of an agent program A. They do so by finding critical decision moments in the logs of A that transpired in the lead up to a harm, and then relaxing or perturbing the program inputs to specify a (family of) counterfactual scenario(s). The investigator then formulates a query asking what the behavior of A ‘might’ or ‘would’ have been [20] under that (family of) counterfactual(s). As we show in the design of our \(\textsf{soid}\hbox {-}\!\textsf{gui}\), such questions can even be formulated in user-friendly interfaces that abstract away all of the formal logic and reasoning of \(\textsf{soid}\) for non-technical practitioners. Once a query is posed, a verification oracle using SMT-based automated reasoning – including constrained symbolic execution – gets the investigator a prompt answer. They can then continue to ‘interrogate the witness’ until they are satisfied they have a sufficient understanding of the purpose of A’s decisions, and terminate the loop.

Contribution. In summary, we developed a command line tool and Python library \(\textsf{soid}\), which uses symbolic execution (through Z3) and SMT solving (through KLEE) to enable rigorous interpretation of the decision-making logic of an autonomous agent. We demonstrate \(\textsf{soid}\) on a pair on instructive involving machine-learned agents. In both cases, we find \(\textsf{soid}\) able to resolve counterfactual queries with reasonable efficiency, even when adaptively posed through the interpretable \(\textsf{soid}\hbox {-}\!\textsf{gui}\) aimed at non-technical practitioners.

A Motivating Example. Consider a program A which computes a decision tree in order to classify the diabetes health risk status of an individual, a classic example in automated counterfactuals with legal implications due to [31]. The decision tree and code of A are shown in Fig. 2. However, the software system surrounding A creates an implicit unit conversion bug: A computes the body-mass-index (BMI) input to the decision tree, using height and weight parameters from its input. But, A expects metric inputs in kg and m and so computes the BMI without a necessary unit conversion, while the program inputs are instead provided in the imperial in and lb. Notably, A is ‘correct’ with respect to natural specifications – as is the decision tree in isolation. The flaw occurs due to a mistake in the composition of the software system as a whole. Nonetheless, the system misclassifies many inputs, as \((kg/m^2) \gg (lb/in^2)\) for the same quantities.

Fig. 2.
figure 2

An incorrect decision tree classification. At left the decision subtree with the incorrect path in bolded red and the missed ‘correct’ branch in dashed blue. At right, the decision tree inference logic as implemented in C. (Color figure online)

The goal of \(\textsf{soid}\) is to enable a legal practitioner to understand the presence of and conditions underlying a potential misclassification. Unlike statistical methods for counterfactual analysis which only analyze the (correct) decision model [31], the minimal assumptions underlying \(\textsf{soid}\) – namely, the lack of an assumption that the broader software system correctly uses the decision model – make it a more capable framework for analyzing this type of ‘implicit conversion’ failure. In §2.1 we run a small empirical analysis on A, showing how \(\textsf{soid}\) enables a user to specify concrete factual and counterfactual queries to understand the conditions under which the failure can occur and their implications.

1.1 Related Work

The explainable AI (XAI) and fairness, accountability, and transparency (FAccT) communities have developed numerous methods and tools for enabling accountability of ADMs, machine-learned or otherwise, for which [1, 10, 13] are recent surveys. The closest tool to \(\textsf{soid}\) of which we are aware is the VerifAI project [9, 11]. Many of these tools and techniques focus on counterfactual reasoning in particular [7, 14, 15, 24, 31]. In comparison to the prevailing lines of this research, \(\textsf{soid}\) emphasizes i) after-the-fact (or ex post) analysis for algorithmic accountability in the style of with legal reasoning; ii) the use of SMT-based verification technologies capable of resolving counterfactual questions about whole families of scenarios; and iii) emphasis on the ‘code as run’, rather than evaluating a specific component like a particular decision model, or requiring an abstracted program representation or a formal model of the (often complex social and/or physical) environment the agent operates within.

2 \(\textsf{soid}\) Tool Architecture and Usage

Figure 1 illustrates the architecture of \(\textsf{soid}\). The tool is implemented in Python, and invokes the Z3 SMT solver [26] for resolving queries.

Before working with \(\textsf{soid}\), the investigator must use their domain expertise to find and extract the critical moment they care about from the factual trace within the logging infrastructure of A. We assume some mechanism guarantees the authenticity of the trace, such as an accountable logging protocol, as has been previously proposed for cyberphysical systems [33]. After extracting the trace the investigator must specify the i) (counter)factual query defining the factual, counterfactual, or family of counterfactual scenarios the query concerns; as well as ii) some possible agent behavior. In the remainder of this section, we explain how the user does so using \(\textsf{soid}\) and a Python library interface it exposes called \(\textsf{soidlib}\). Constraints are specified through an API similar to Z3Py, see Fig. 3, while queries can be written as independent Python scripts or generated dynamically within a Python codebase.

Upon invocation, \(\textsf{soid}\) symbolically executes A to generate a set of feasible program paths as constrained by the (counter)factual query. The constraints in that query must be provided directly to the symbolic execution engine – an integration API exposes the query to the symbolic execution in order to enable this communication, or the user can do so directly outside \(\textsf{soid}\) itself. After the symbolic execution completes, \(\textsf{soid}\) formulates the query formula and invokes Z3 to resolve it. It then outputs to the user the finding, as well as any model – which exists in the event of a failed ‘would’ or successful ‘might’ query.

Fig. 3.
figure 3

A counterfactual specified using \(\textsf{soidlib}\) for a simplified grid-based car crash implementation (also available within our codebase alongside our \(\textsf{soid}\hbox {-}\!\textsf{gui}\)). This query leaves the turn signal of the ‘other’ car at (2, 1) unconstrained, defining a counterfactual family. The objects , , and are user-specified in an omitted function, including datatype.

Query API. The query API of \(\textsf{soid}\) is exposed as a Python library called \(\textsf{soidlib}\). A query specified using \(\textsf{soidlib}\) is composed of a name and query type, as well as a set of functions. These functions return either \(\textsf{soidlib}\) variable declarations or constraints, which are in either case automatically encoded into a set of corresponding Z3Py constraints for use during SMT solving to establish the satisfiability or validity of the query. An example query is shown in Fig. 3. The main API function interfaces the user must define in order to encode their query are:

  •   : A function that must return three dictionaries of \(\textsf{soidlib}\) variable declarations, enumerating the set of environmental inputs ( ) and internal state inputs ( ) over which the factual or (family of) counterfactual scenario(s) are defined, as well as the set of decision ( ) variables over which the behavior is defined. In order to do this \(\textsf{soidlib}\) exposes a variety of variable types, which it then converts into Z3 statements with the appropriate logical sorts as required by the underlying SMT logic (e.g., encoding an object of integer type as an object of the 32-bit bitvector sort).

  •   : A function that must return a \(\textsf{soidlib}\) constraint over E describing the environmental program inputs.

  •   : A function that must return a \(\textsf{soidlib}\) constraint over S describing the internal state program inputs.

  •   : An optional function, returns a \(\textsf{soidlib}\) constraint encoding a concrete factual to be negated from the query formula, and therefore excluded from the set of possible output models.

  •   : A function that must return a \(\textsf{soidlib}\) constraint over D describing the behavior being queried.

Language Support. Through a modular API \(\textsf{soid}\) extensively supports any symbolic execution engine that produces output in the SMT-LIB format [4]. An integrator needs only to write a Python class implementing an interface between \(\textsf{soid}\) and the engine. As such, \(\textsf{soid}\) supports agents written in any programming language for which a suitable symbolic execution engine is available. We use the KLEE family of symbolic execution engines throughout our benchmarks. At present, support is integrated into \(\textsf{soid}\) for C language programs with floating-point instructions using KLEE-Float [21], working over the SMT logic of \(\texttt {QF\_FPBV}\), the quantifier-free theory of floating-point and bitvectors. Support is also integrated for C and C++ language programs without floating-point using mainline KLEE [5], producing representations in \(\texttt {QF\_ABV}\), the quantifier-free theory of arrays and bitvectors.Footnote 1 KLEE can be further extended to analyze other LLVM-compilable languages such as Rust [22], while other engines exist for compiled binaries [29] and many other languages including Java [2] and Javascript [23].

Symbolic Execution API. Adding support for a new symbolic execution engine to \(\textsf{soid}\) requires specifying between two and five functions: , , , , and , which are all hooked into the main \(\textsf{soid}\) execution path. Only and are necessary – they must respectively invoke the symbolic execution and then process the output into a list of Z3Py statements capturing the possible path conditions. Optionally, provides a hook for cleaning up temporary or output files generated by the symbolic execution engine, while and are designed to automate additional steps that may be desirable for the symbolic execution – the former is given access to the query, the latter additionally to the set of variables declared along the path conditions. For example, KLEE-Float automatically converts arrays into bitvectors using a technique called Ackermannization [25], and renames any such variables in the process. The KLEE-Float function packaged with \(\textsf{soid}\) i) casts objects as necessary; and ii) constrains them to equal the corresponding input declarations in the function so that they alias those inputs, e.g., adding the constraint where is KLEE-Float’s synthesized, Ackermannized representation of .

Query to Symbolic Execution. One of the major benefits of the ex post method of \(\textsf{soid}\) is that the (counter)factual query specified by the user can be used to constrain what parts of the program A are relevant to the scenarios in question and therefore must be included in the formula being checked. However, in order to do so the query must also be exposed to the symbolic execution engine in order to limit the symbolic execution to just the (ideally small) set of program paths feasible under the (counter)factual scenario conditions. This can either be done independent of \(\textsf{soid}\), e.g. by the code invoking \(\textsf{soid}\) when it is used as a library, or by using the hook in the symbolic execution framework. At present, our codebase exclusively uses the external method.

Invocation. There are two ways to use \(\textsf{soid}\): through a command line script (the soidcli) or directly as a Python library. If the latter, the user calling the code must declare a object and configure it with i) a ; ii) the path to the A; and iii) the identity of the symbolic execution engine. If using the soidcli, the CLI script declares the oracle object for the user, who must specify the path to where (a collection of) objects can be found declared in independent Python scripts (as well as the same path to A and symbolic execution engine identity). In case multiple variants of A are required in order to specify different symbolic execution preconditions for different counterfactual families, \(\textsf{soid}\) passes an identifier corresponding to a index that the user can specify through the CLI interface. In the examples present in the \(\textsf{soid}\) codebase this identifier is passed to a Makefile, which is then used to invoke KLEE(-Float) on the correct variant.

2.1 Example #1: Decision Tree Inference

Using \(\textsf{soid}\), we analyzed our decision tree misclassification motivating example. The results are summarized in Table 1, and were gathered on an Intel Xeon CPU E5-2650 v3 @ 2.30GHz workstation with 64 GB of RAM. We used scikit-learn [27] to train a decision tree over the Pima Indians dataset as used in [31]. We then implemented A as a C program that preprocesses the data – triggering the software system bug, as it does so without the necessary unit conversion – and then infers a binary classification using the decision tree. In order to create the factual basis for an investigation, we then invoked A on an example input where the unit conversion bug leads to the misclassification of the input as low risk instead of high risk.

Table 1. Benchmark results for our incorrect statistical inference example.

We posed two queries:

  1. 1.

    Did the classification happen as described?

  2. 2.

    Does there exist a weight input parameter for which the instance is instead classified as high risk instead?

The former query provides a baseline for how much the counterfactual possibility of the latter query increases the cost of solving. It also fulfills the natural goal of many accountability processes to formally confirm apparent events and create a confirmed, end-to-end chain of analysis so that there is the highest possible societal confidence in any policy changes or punishments derived it. Both of these queries were resolved by \(\textsf{soid}\) in the positive, requiring at most a few seconds, even over a program structure in A that includes recursive invocations of floating-point comparison operations. Together, they demonstrate the weight input to A was causal for the classification, and establish its lack of unit conversion as contributory to the (harmful) misclassification decision.

Working with A, \(\textsf{soid}\) provides an adaptive oracle allowing the investigator to query its behavior and receive prompt and useful answers. The output of the program is also simple and interpretable. Without an intermediating GUI or developer tools, \(\textsf{soid}\) does require comfort with its API and the logical framework of expressing (counter)factuals and program outputs, but we do not expect a usable interface would be meaningfully difficult to integrate for this example.

3 \(\textsf{soid}\hbox {-}\!\textsf{gui}\) Architecture and Usage

The \(\textsf{soid}\hbox {-}\!\textsf{gui}\) is a web-based interactive interface for \(\textsf{soid}\) applied to the domain of autonomous vehicle accountability. It demonstrates that the use of \(\textsf{soid}\) can be managed by a high-level abstraction that exposes to non-technical practitioners the expressiveness and capacity of the tool, but none of its logical or technical complexity. We demonstrate the design and use of the \(\textsf{soid}\hbox {-}\!\textsf{gui}\) in Fig. 4.Footnote 2

Architecturally, the \(\textsf{soid}\hbox {-}\!\textsf{gui}\) is composed of three main components: i) a frontend written in React; ii) a backend server written in Python that operates a vehicle simulation using the Duckietown simulator for the OpenAI Gym (henceforce Gym-Duckietown [6]) and also interfaces with \(\textsf{soid}\); and iii) a proxy server that manages communication between the browser frontend and the server backend. The Duckietown simulation is used as a stand-in for the real vehicle logs and instrumentation on which \(\textsf{soid}\) would be deployed in practice. We designed the crossroads intersection simulation interface to mimic the real-time driving context interface generated by contemporary autonomous vehicles, like those produced by Tesla. We stress that Gym-Duckietown is not exposed to \(\textsf{soid}\), which operates exclusively over the program (and decision model) A. Gym-Duckietown is used only to simulate crashes and generate logfiles as the basis for \(\textsf{soid}\) queries.

Outside of the \(\textsf{soid}\) investigatory loop, the user can first use the \(\textsf{soid}\hbox {-}\!\textsf{gui}\) to design a car crash scenario by manipulating the location, destination, and other properties of the simulated car through menus and a drag and drop interface (see Fig. 4). The \(\textsf{soid}\hbox {-}\!\textsf{gui}\) also allows the user to select from among five different decision logics for the ego car: a directly programmed ‘ideal’ car, and four reinforcement-learned (specifically, Q-learned [32]) agents, colloquially the ‘defensive’, ‘standard’, ‘reckless’ and ‘pathological’ decision models. They are so named on the basis of the reward profiles used to train them.

Fig. 4.
figure 4

After the (simulated) execution, the investigator (1) selects a critical moment; (2) poses a counterfactual query; (3) invokes the SMT solver; and (4) is presented with the response from the oracle.

After an iteration of the simulation (usually, after a crash occurs), the \(\textsf{soid}\hbox {-}\!\textsf{gui}\) allows the user to operate the \(\textsf{soid}\) investigatory loop. Using a slider the user can pick out a moment from the logs of the agent, and supported by detailed logging information about the inputs to A at each timestep can select the critical moment (see Step 1 in Fig. 4). They can then use car-specific dropdown menus to specify counterfactuals about any of the agents in the system in a user-friendly manner, which fully abstracts away the underlying logical formalism (Step 2 in Fig. 4). Finally, they can invoke \(\textsf{soid}\) on the query they have specified by asking whether the ego car ‘might’ or ‘would’ move or stop under the (family) of counterfactual scenario(s) they have defined (Step 3 in Fig. 4). After solving the \(\textsf{soid}\hbox {-}\!\textsf{gui}\) then presents an interpretable answer, including a valuation for any variables the counterfactual was stated over when one is available (Step 4 in Fig. 4). The user can then clear or adjust their counterfactual statement and ask further queries, until satisfied they have reached an understanding of the car’s decision making under the selected decision model.

To use \(\textsf{soid}\), the \(\textsf{soid}\hbox {-}\!\textsf{gui}\) first writes out a C language file with the necessary constraints for the KLEE-Float symbolic execution. It then creates the and objects, allowing it to invoke \(\textsf{soid}\) through the Python library interface. Once \(\textsf{soid}\) has invoked KLEE-Float and Z3 to determine the answer to the query the output is then processed. When applicable, this includes model parsing. The result is then passed back to the browser frontend to be shown to the user.

3.1 Example #2: Three Cars on the Stand

We use the \(\textsf{soid}\hbox {-}\!\textsf{gui}\) to investigate a crash in Fig. 4. It is a simple intersection scenario, where the blue ‘ego’ car under investigation strikes the broadside of the red ‘other’ car which has indicated a right turn but proceeded straight nonetheless. As the red car possesses the right of way the fault lies with the blue car. We investigate ‘to what purpose’ the blue car entered in the intersection, in order to grade the severity of its misconduct in conjunction with legal norms that frequently apply the greatest possible penalties to purposeful action [16]. Notably, this crash occurs for all three of the ‘standard’, ‘reckless’, and ‘pathological’ decision models (but not the ‘defensive’ model).

Table 2. Benchmark results for our car crash example. For the final query, we phrased it as both a ‘would’ and a ‘might’ counterfactual for comparison.

We pose three queries about the blue car’s decision making at the moment when it releases the brakes and enters the intersection (Step 1 in Fig. 4):

  1. 1.

    Did the blue car actually decide to move, as it appeared to?

  2. 2.

    Could a different turn signal have led the blue car to remain stationary?

  3. 3.

    If the blue car had arrived before the red car and the red car was not signaling a turn, might the blue car have waited to ‘bait’ the red car into entering the intersection and creating the opportunity for a crash?

Intuitively, the second question should distinguish the ‘standard’ car from the ‘reckless’ and ‘pathological’, which should continue to move into the intersection no matter what. The third question should then distinguish between the ‘reckless’ and ‘pathological’ cars, with the former taking the opportunity for a clean path through the intersection, while the latter lies in wait.

There are natural explanations for the behavior of the other decision models: the ‘standard’ car is undertaking common human driving behavior given the perception of an unobstructed path through the intersection, the ‘reckless’ car demonstrates a prioritization of individual speed over collective safe driving, while the ‘pathological’ car might be attempting to trigger a crash for insurance fraud. Notably, in the case of the ‘reckless’ car, we do not want to inherently describe that behavior as incorrect as verification methods might, such as any implementing [28]. It could be that exigent circumstances necessitate reckless behavior, and that the blue car not entering the intersection as fast as possible would trigger a greater harm than a minor crash.

The results of our benchmarks are summarized in Table 2. As before, all of the statistics were gathered on an Intel Xeon CPU E5-2650 v3 @ 2.30GHz workstation with 64 GB of RAM. Each heading in Table 2 describes a family of (counter)factual scenarios and behavior, as well as whether the query is a verification (‘would...?’) or counterfactual generation (‘might...?’) one. The rows list the decision model invoked within A, the answer as determined by the verification oracle, timings, and the total number of feasible paths.

We find that \(\textsf{soid}\) provides an interpretable and adaptive oracle allowing the investigator to query a sequence of counterfactuals without directly interacting with A or the machine learned-model underlying it. Most of our queries resolved within \(< 20s\), providing effective usability. The results of the queries demonstrate the distinctive behaviors expected of the three conflicting purposes, allowing a capable investigator to distinguish them as desired.

4 Conclusion

We briefly conclude by considering some future directions for extensions to \(\textsf{soid}\).

Supporting DNNs. Many modern machine-learned agents rely on models built out of deep neural network (DNN) architectures. Extending \(\textsf{soid}\) to support such agents – most likely by relying on recent innovations in symbolic execution for neural networks [12] and SMT-based neural network verifiers [17, 18] – is a possible direction for increasing the utility of \(\textsf{soid}\).

Programming Counterfactuals. Although \(\textsf{soid}\) is adaptive, that does not necessarily mean it needs to be interactive. A further possible direction would be to design a counterfactual calculus as the basis for a programming language that would invoke \(\textsf{soid}\) as part of its semantics. Such a language could potentially be the basis for formalizing legal regimes for which counterfactual analysis forms a critical component. A related direction would be to integrate with a scenario specification language like SCENIC from the VerifAI project [9, 11] to add another layer of capability onto the specification of families of counterfactuals.