The tracking and tracing conditions  provide a philosophical grounding for informing the development of human-AI systems under meaningful human control. Yet, translating these philosophical concepts into a concrete design and engineering practice is far from trivial. For instance, the tracking condition suggests that a human-AI system should be responsive to the moral reasons of a relevant human. But, how do we define the relevant human in a given circumstance? How should a given AI system recognize a moral reasoning? Does the condition imply that every AI system should be designed to be morally sensitive ? The tracing condition implies the necessity of a proper moral and technical understanding from at least one relevant human interacting and designing the system. Does this imply that the AI system should be able to recognize if and when an interacting human has such proper moral and technical understanding? Or, does this imply that we need protocols for the design and use of AI systems that define if and when a human can and must have such understanding?
In an effort to answer these questions — and more — the authors, a group of researchers from various backgrounds (engineering, computer science, philosophy of technology, ethnography and design), engaged in an iterative process of abductive thinking . Specifically, we built on Dorst’s conceptual framework of abductive thinking , where both a desired value (meaningful human control) and a working principle (tracking & tracing conditions) are known, to brainstorm ideas of what the solutions to achieve these might be. The generated ideas were then grouped into thematic areas and synthesized into actionable properties. It has to be noted, however, that although this work explores the solution space of the framework, our aim is rather to provide a contribution that sits on a meta-level, in between the what and the how (see Fig. 1).
Specifically, as a result of our abductive thinking where we collectively reflected on what strategic and engineering solutions would enable the two necessary conditions of meaningful human control , we identified a set of four actionable properties. In the following subsections we describe these properties in detail and illustrate their practical implications. To better highlight the properties and the implications, we make use of two example application scenarios: automated vehiclesFootnote 3 and AI-based hiringFootnote 4. Both cases manifest an urgent need for meaningful human control in non-forgiving scenarios that strongly impact people’s lives (e.g., bodily harm, unfair decisions, discrimination), and their differences with respect to time constraints, embodiment, and involved stakeholders juxtapose different aspects of realizing these properties in human-AI systems.
Property 1. The human-AI system has an explicit moral operational design domain (moral ODD) and the AI agent adheres to the boundaries of this domain
As the human-AI system has to be “responsive to relevant human moral reasons” (i.e., the tracking condition), we need to identify the relevant humans, their relevant (moral) reasons, and the circumstances in which these reasons are relevant. To this end, specifying the technical conditions in which the system is designed to operate is not sufficient. Designers should consider a larger design space, one that captures also the values and societal norms that must be considered and respected during both design and operation.
Building on the concept of operational design domain (ODD) which originates in the automotive domain , we name this larger design space the moral operational design domain (moral ODD). The concept of ODD is often used in the context of automated driving and refers to a set of contextual conditions under which a driving automation system is designed to function: outside of it a human driver is responsible. Specific contextual properties of the automotive ODD typically include factors like road structure, road users, road obstacles and environmental conditions (material elements), as well as human-vehicle interactions and expected vehicle interactions with pedestrians (relational elements) . In terms of legal responsibility, the ODD constitutes a selection of operation scenarios that can be safely managed  by the automation and in which undesired consequences are minimized . As such, we believe it is a valuable concept to extend beyond automated vehicles, but for human-AI systems in general.
The current conceptualization of the ODD strongly focuses on the technical aspects of operation and the goal to extend the context boundaries of the ODD. However, consideration of the wider societal implications is lacking. Similar to Burton et al. , we argue that the concept of ODD should also emphasize the broader social and ethical implications. We propose this extended concept of ODD so that functional considerations of where and when a human-AI system can operate, are seconded and complemented to the definition of the domain in which a system is ought or should not operate from a moral perspective (Fig. 2).
A simple example of a hammer illustrates the difference between the “can” (e.g., material and relational elements) and the “ought to” dimensions (e.g., moral elements). From a purely functional perspective, a hammer “can” be used as a weapon against another person. However, the morally acceptable use of a hammer is for hammering nails (ought to), not to injure other people (should not). Common sense already tells us that the use of a hammer as a weapon is in most cases morally unacceptable (can but should not). It is clear that the responsibility for proper use lies with the user, not the manufacturer (except in cases where the hammer clearly does not function properly, e.g., the head suddenly comes loose from the handle and injures a person).
In a scenario involving complex human-AI systems, this is often much less clear cut. In the automated vehicle case, the moral ODD could contain moral reasons representing safety (e.g., avoid road accidents), efficiency (e.g., reduce travel time), and personal freedom (e.g., enhance independence for seniors), to name just a few. In the AI-based hiring context, moral reasons could include, from the employer’s side, reducing discrimination or increasing the number of applicants in the recruitment process, while for the applicants’ side autonomy over self-representation could be considered very relevant. In both contexts, however, there might be tensions among different moral reasons and stakeholders, requiring an inclusive specification and careful communication of the moral ODD.
The specification and clear communication of the moral ODD support relevant humans (e.g., users, designers, developers) to be aware of the moral implications of the system’s actions and their responsibility for these actions, thereby supporting the tracing condition of meaningful human control. Furthermore, if the operation of the AI agent remains confined within the boundaries of what it “can do” and “ought to do”, the tracking condition of meaningful human control is supported as well, as this makes the human-AI system more responsive to human understanding of what is the morally appropriate domain and mode of operation. Achieving these benefits requires that: (1) the moral ODD be explicitly defined; (2) the AI agent embed concrete solutions to constrain the actions of the human-AI system within the boundaries of the ODD.
To define the moral ODD, designers and developers need to engage with fundamental questions of what are the elements composing the moral ODD and how do the features of each element affect the system’s behavior. The process starts with an ontological modelling of the environment(s) in which the human-AI system is expected to operate. Such complex assemblage of elements and relationships could be meaningfully represented within the moral ODD by making use of principles from existing research on software applications where ontologies are developed to enable context-aware computing systems [50, 51]. The mapping of material and relational elements characterizing a domain should be complemented with an investigation of what might be the morally relevant reasons, what they represent in the specific context, assumptions and consequences related to the system operation. Such understanding of the moral landscape of an AI agent under development could be built by means of extensive literature and case reviews [52,53,54], participatory approaches such as interviews, interactive workshops, and value-oriented coding of qualitative responses , which can be supported by natural language processing algorithms .
How to satisfy the second requirement (constraining the AI agent to the boundaries of the moral ODD) varies according to the constituent elements of the moral ODD. When constraining the material and relational aspects of the system behavior, approaches developed in the automotive and aircraft domains can be a useful reference, e.g., risk-based path planning strategies for unmanned aircraft systems in populated areas  and geofencing . Relational aspects can be addressed through envelope protection. In the aircraft domain, flight envelope protection systems prevent the pilot from making control commands that drive the aircraft outside its operational boundaries, a concept that has also been adopted for unmanned aerial vehicles . This concept could be extended beyond the aircraft domain, and become a more general design pattern for constraining the relational elements of the moral ODD in the systems involving both embodied and non-embodied AI agents .
Moral constraints are arguably the most challenging to enforce. One possible way of imposing them is to set probabilistic guarantees on system outcomes . However, these approaches might not hold in real-world applications. Due to the non-quantifiable nature of morally relevant elements, as well as moral disagreements among humans, the boundaries of the moral ODD will remain blurred . Hence, it is crucial that humans, not AI agents, are empowered to be aware of their responsibilities to make conscious decisions if and when the human-AI system should deviate from the boundaries defined by the moral ODD. The assessment of whether and how an AI agent is confined to the moral ODD is not a binary check, but rather a contextualized and deliberated analysis of the interaction between the AI agent, human agents, and the social, physical, ethical, and legal environment surrounding them. Humans, to conclude, should have an understanding of such blurry boundaries of the moral ODD and their responsibility to meaningfully control the AI agent in this process. Importantly, this includes the possibility of deciding that the use of an AI agent is not acceptable in certain contexts.
Property 2. Human and AI agents have appropriate and mutually compatible representations of the human-AI system and its context
For a human-AI system to perform its function, both humans and AI agents within the system should have some form of representations of the involved tasks, role distributions, desired outcomes, the environment, mutual capabilities and limitations. Such representations are often referred to as mental models; these models enable agents to describe, explain and predict the behavior of the system and decide which actions to take [61,62,63].
Shared representations, i.e., representations that are mutually compatible between human and AI agents within the system, allow the agents to have appropriate understanding of each other, the task, and the environment , which facilitates agents to cooperate, adapt to changes, and respond to relevant human reasons. To ensure safe operation of the system, agents should also have a shared representation of each other’s abilities and limitations. Specifically, the AI agents should account for humans’ inherent physical and cognitive limitations, while human agents should account for the AI agents’ limitations to avoid issues such as overreliance . Furthermore—crucial to achieve meaningful human control—these shared representations should include the human reasons identified in the moral ODD (Fig. 3), which can change over time and across contexts. Due to the dynamic nature of elements of the shared representations, the human and AI agents should be able to update their representations of the potentially changing reasons accordingly.
Incompatibility between representations could result in the lack of responsiveness to human reasons, thereby leading to undesired outcomes with significant moral consequences. For example, inconsistent mental models between a human driver and automated vehicle about “who has the control authority”, in which the human driver believes that the automated vehicle has control and vice versa, could result in a critical and unsafe system state .
In order for the agents’ shared representations to facilitate the system’s tracking of relevant human reasons, the system designers first need to define which aspects of the system and its context (including relevant humans, AI agents, the environment, and the moral ODD) each agent should have a representation of. The process of determining what kinds of representations are needed will be context-specific and depend on the moral ODD of the system. A useful approach to determine the necessary representations and to translate these high-level concepts into practical design requirements is co-active design . Specific to building and maintaining shared representations, this approach provides guidelines on how to establish observability and predictability between the human and AI agents, including what needs to be communicated and when .
Representations can include practical matters such as task allocation, role distribution and system limits, but also understanding of how humans perceive the AI agents, human acceptance of and trust in the human-AI system, humans values and social norms. This should also include determining the appropriate level of representation. For instance, for an automated vehicle to interact with a pedestrian, the designers need to determine whether it suffices for the vehicle to have a representation of just the location of a pedestrian on the road and their movement trajectory, or also the height and age of that human, their goals and intentions. In the context of AI-based hiring, a key aspect requiring shared representation is the meaning of competence. In particular, the meanings of soft skills, such as teamwork and creativity, are highly fluid, context-dependent, and contestable. Therefore, aligning the job-specific meaning of competence among job seekers, employers, and any AI agent involved in the hiring process is critical.
Once the representations required for each agent are defined, the design and engineering choices need to sufficiently take these into account. Specifically, such choices should facilitate (1) AI agents to build and maintain representations of the humans and their reasons, and (2) humans to form mental models of AI agents and the overall human-AI system. These shared representations can be achieved through various combinations of implicit (e.g., through interaction between agents) or explicit ways (e.g., by means of human training, verbal communication). For example, to allow humans to build and maintain a representation of an AI agent, it can be developed to be observable and predictable implicitly through its design (e.g., glass-box design ), allowing the operator to better understand the AI agent’s decision-making. Ecological interface design can also leverage knowledge on human information processing to design human-AI interfaces that are optimally suited to convey complex data in a comprehensible manner . Maintaining accurate representations during the human-AI system’s deployment can also occur through interaction, either implicitly (e.g., through intent inference from observed behavior) or explicitly (e.g., explicit verbal or written messages). For example, an AI system can probe through behavior whether the human is aware of its intentions before committing to a decision .
For the AI agents to have appropriate representations of human agents, the assumptions about human intentions and behavior adopted by AI agents (either implicitly or explicitly) need to be validated. This can be aided by incorporating theoretically grounded and empirically validated models of humans in the interaction-planning algorithms of AI agents [69, 70], or by augmenting bottom-up, machine-learned representations with top-down symbolic representations [71, 72]. An alternative approach, value alignment , aims to mitigate the problems that arise when autonomous systems operate with inappropriate objectives. In particular, inverse reinforcement learning (IRL), which is often used in value alignment, aims to infer the “reward function” of an agent from observations of the agent’s behavior, also in cooperative partial-information settings (cooperative IRL) . Although IRL is likely not sufficient to infer human preferences from observed behaviour since human planning systematically deviates from the assumed global rationality , such approaches could still support agents to maintain aligned shared representations .
Property 3. The relevant humans and AI agents have ability and authority to control the system so that humans can act upon their responsibility
Relevant humans should not be considered just mere subjects to be blamed in case something goes wrong, i.e., an ethical or legal scapegoat for situations when the system goes outside the moral ODD. They should rather be in a position to act upon their moral responsibility by influencing the AI system throughout its operation, and to bring the system back to the moral ODD if needed (Fig. 4).
This is only possible when the distribution of roles and control authority between humans and AI (“who is doing what and who is in charge of what”) is consistent with their individual and combined abilities, including reasonable mechanism for overruling the AI agent through intervening and correcting behavior, setting new goals, or delegating sub-tasks.
Flemisch et al.  provide a thorough account on the importance of an appropriate balance between an agent’s ability, authority, and responsibility in human–machine systems: ability to control should not be smaller than control authority, and control authority should not be smaller than responsibility. We argue that this account applies to complex human-AI systems as well. The ability of a human or AI agent includes their skill and competence to perceive the state of a system and the environment. This also includes a way to acquire and analyze relevant information, to make a decision to act, and to perform that action appropriately . Ability also includes the resources at their disposal, such as tools (an autonomous vehicle without a steering wheel would severely hamper the human’s ability to control the vehicle’s direction; job candidates’ ability to represent themselves would be heavily impaired by the lack of a feedback mechanism) or time (an automated vehicle that would wait until the very last second to alert the driver of a dangerous situation also limits the driver’s ability to direct the vehicle to safety; an employer would have no control and understanding of an AI-based hiring system if assessment of candidates would be provided only after the selection process finishes).
The understanding of an (AI or human) agent’s ability is intrinsically related to the socio-technical context in which the system is embedded. Hence, it is important that tasks are distributed according to the agent’s ability in the context, not only from a functional perspective, but also accounting for the values and norms intrinsic to the activity. Approaches such as the nature-of-activities [78, 79], under the umbrella of Value Sensitive Design , can support the understanding of which set of tasks should be (partially or totally) delegated or shared with AI agents, and which should be left exclusively to humans. Given the collaborative nature of many human-AI systems, team design patterns can be used as an intuitive graphical language for describing and communicating to the team the design choices that influence how humans and AI agents collaborate [80, 81].
The second component of the account proposed in  is control authority, i.e., the degree to which a human or AI agent is enabled to execute control. Consistency between authority and ability requires that an agent’s authority does not exceed their ability. And similarly, responsibility should not exceed authority. Thus, an agent should be responsible only for tasks they have authority to perform, and they should have authority only over tasks they are able to perform. A key implication of this consistency is that control is exerted by the agent that has sufficient ability and authority, and more responsibility is carried by the agents that exert more control. While ability and authority are attributes that both human and AI agents possess, we consider responsibility as a human-only quality. Therefore, the ability and authority of a human-AI system must be traced to responsibilities of relevant humans, e.g., engineers, designers, operators, users, and managers.
In the automated vehicle case, the driver has authority to control the vehicle by accelerating, breaking and steering, as well to take over control authority at any time. In the case of AI-based hiring, employers’ authority includes setting a threshold for a passing score and deciding who to hire. Simply giving human agents final authority by design, without ensuring proper ability, is not sufficient to empower humans to act upon their moral responsibility. For example, a driver may have final authority over a fully autonomous car, but the driver’s loss of situational awareness, or even skill degradation as a result of systematic lack of engagement in the driving task, will limit the driver’s ability to exert that control authority [41, 65, 82]. The same might happen for a manager with final authority over who to hire, if they merely sign off on the hiring recommendations of the AI agent, without substantively engaging in the assessment process themselves.
As authority should not be smaller than ability, it is important to build a baseline understanding of the abilities of human and AI agents and evaluate their consistency with the control authority provided by the system’s design. From the human side, human factors literature  can support the identification of a realistic baseline on human ability by applying psychological and physiological principles to understand challenges that are likely to arise in human-AI interaction [82, 84]. From the AI side, a proper understanding of ability should not only be task-oriented (e.g., measuring performance from data sets against benchmark), but also behavior-oriented. Approaches to understand AI ability in context include approaches inspired by human cognitive tests, information theory , and ethology (related to animal behavior) . Designing for appropriate authority and ability also requires us to expand the scope of design from human-AI interactions to social and organizational practices . Human training, oversight procedures, administrative discretion, and policy are just a few examples of organizational elements that significantly determine and shape agents’ authority and ability.
Design, training and technological development may “expand” or “shrink” agents’ abilities through innovation, including training humans for new skills and equipping AI agents with new technological capabilities, or achieving more through interaction between humans and AI and their combined abilities. From the AI side, especially for machine learning-based systems, as the relation between the input data and the target variable changes over time, concept drift methodologies can be applied to identify new situations which might impact the AI agent’s ability to respond to new situations [86, 87]. From the human side, interaction with technology might lead to behavioral adaptation and unwanted situations, e.g., speeding when driving with intelligent steering assistance provided by an automated vehicle , decreasing human’s ability to keep the system within the moral ODD. In such situations, the human-AI system might move to a fallback state  or attract the driver’s attention back to the supervision task thus restoring the driver’s ability to act upon their ultimate responsibility for the vehicle’s operation.
Shared control is a promising approach to keep a balance between control ability and authority, with relevant applications in the domain of automated vehicles, robot-assisted surgery, brain-machine interfaces, and learning . In shared control, the human(s) and the AI agents(s) are interacting congruently in a perception-action cycle to perform a dynamic task, i.e., control authority is not attributed either to the human or to the AI agent, but is shared among them . Shared control could be particularly useful in human-AI systems that need to act in complex situations that can rapidly change beyond the envisioned moral ODD, and where rapid human adaptation and intervention is needed.
Property 4. Actions of the AI agents are explicitly linked to actions of humans who are aware of their moral responsibility
Satisfying the first three properties ensures that relevant humans are capable of acting upon their moral responsibility (property 3), are aware of the moral implications of the system’s actions (property 1), and have shared representations with AI agents (property 2). Yet, what is left undiscussed is the requirement to ensure that the effects of the system’s actions are traceable to the relevant humans’ moral understanding.
To trace any consequence of the human-AI system’s operation to a proper moral understanding of relevant humans, there should be explicit, explainable and inspectable link(s) between actions of the system and corresponding human morally loaded decisions and actions. We acknowledge that such link(s) might be a more demanding form of tracing than what was originally proposed in , nevertheless we deem it necessary to enable the tracing condition to be inspectable. Furthermore, we argue that moral understanding of the system’s effects should be demonstrated by, at least, those humans who make decisions with moral implications on the design, deployment, or use of the system, even if the actions that bring a human decision to life are executed by the AI agent. Hence, all relevant human decisions related to e.g., design, use, policy must be explicitly logged and reported , in order to link actions of the AI agents to relevant decisions, preferences, or actions of humans who are aware of the system’s possible effects in the world.
Even if all relevant humans made their decisions responsibly and with full awareness of their possible moral implications, the lack of a readily identifiable link from a given action of the AI agent to the underlying human decisions would still result in loss of tracing. The links between actions of the systems and corresponding human morally loaded decisions and actions need to be explicitly identifiable in two ways (Fig. 5):
Forward link: whenever a human within the human-AI system makes a decision with moral implications (e.g., on the design, deployment, or use of the system), that human should be aware of their moral responsibility associated with that decision, even if the actions that bring this decision to life are executed by the AI agent.
Backward link: for any consequence of the actions of the human-AI system, the human decisions and actions leading to that outcome should be readily identifiable.
Enabling the forward link from human moral understanding to AI agents’ actions relates to the epistemic condition (also called knowledge condition) of moral responsibility, which posits that humans should be aware of their responsibility at the time of a decision [3, 92]. Hence, the human-AI system should be designed in a way that simplifies and aids achieving moral awareness. This requires explicit links between design choices and stakeholder interpretations of moral reasons that are at stake. Values hierarchies  provide a structured and transparent approach to map relations between design choices and normative requirements. A value hierarchy visualizes the gradual specification of broad moral notions, such as moral responsibility, into context-dependent properties or capabilities the system should exhibit, and further into concrete socio-technical design requirements. Such a structured mapping can equip stakeholders with the means to deliberate design choices in a manner that explicitly links each choice to relevant aspects of moral responsibility. These deliberations, as well as the accompanying rich body of empirical and conceptual research must be well documented, inspectable, and legible. This kind of transparency also supports the backward link between the system’s actions and the design choices made by relevant humans.
Furthermore, requirements such as explainability of the system’s actions can be essential in effectively empowering human moral awareness. Since its early works, the field of explainable AI has increased its scope from explaining complex models to technical experts towards placing the target audience as a key aspect . Given a certain human or group of humans as target audience, we see explainability in the context of supporting the forward link as clearly presenting the link between the system’s actions and human moral awareness, as well as their alignment to the moral ODD. For example, consider an automated vehicle which slows down and pulls off the road after it recognizes a car accident . Right after that the vehicle should then remind the driver of their duty to provide assistance to possible victims in the accident. In the context of AI-based hiring, explanations of assessment scores in language that directly links observed job seeker performances to job-specific meanings of competence can help employers, job seekers, designers, and developers better outline the boundaries of moral ODD during design phase. For example, this can help reveal whether there is misalignment between conceptions of competence among the human agents and the AI algorithm.
In complex socio-technical environments the establishment of links between human moral awareness and actions of a human-AI system is complicated by the “problem of many hands”, which happens when more than one agent contributes to a decision. It becomes less clear who is morally and legally responsible for its consequences . The “problem of many things” complicates this further: there are not only many (human) hands, but also many different technologies interacting and influencing each others, be it multiple AI agents or the interplay between sensors, processing units, and actuators . In case of unintended consequences of the AI agent’s actions, this complexity can hinder the backward link, i.e., tracing the responsibility back to individual human decisions. This challenge calls for systemic, socio-technical design interventions that jointly consider social infrastructure (e.g., organizational processes, policy), physical infrastructure, and the AI agents that are part of these infrastructures.
Recent developments using information theory to quantify human causal responsibility  can provide relevant insight for the design and development of appropriate forward and backward links, by providing a model with which hypotheses can be tested. However, simplifying assumptions used in this research need to be addressed to account for more realistic settings. Methods from social sciences, e.g., Actor-Network Theory (ANT)  can support the development of tracing networks of association amongst many actors, which can help understand how, for example, humans may offload value-laden behavior onto the technology around us. In the “sociology of a door closer”,  describes how we made door closers the element in the assembly that manifests politeness by ensuring the door closes softly and gradually, even as the human actors may barge through without any action to regulate the door). This sort of division of moral-labor should not be done mindlessly, it requires human decisions to be analyzed and their relation to the moral ODD to be carefully analyzed.
Although establishing explicit links between human decisions, human moral awareness, and actions of the AI agents is challenging, they allow appropriate post hoc attribution of backward-looking responsibility for unintended consequences, helping to avoid responsibility gaps and prevent similar events from repeating in the future. It also facilitates forward-looking responsibility by creating an incentive for the relevant humans to proactively reflect on the consequences of their decisions (design choices, operational control, interactions, etc.).
Summary of the four properties
We summarize the proposed properties of systems under meaningful human control as follows:
Property 1: The human-AI system has an explicit moral operational design domain (moral ODD) and the AI agent adheres to the boundaries of this domain.
Property 2: Human and AI agents have appropriate and mutually compatible representations of the human-AI system and its context.
Property 3: The relevant agents have ability and authority to control the system so that humans can act upon their responsibility.
Property 4: Actions of the AI agents are explicitly linked to actions of humans who are aware of their moral responsibility.
In our view, these properties are constructive as well as open: they can serve as practical tools for supporting the design, development and evaluation of human-AI systems, while being applicable to diverse types of systems (as illustrated by the cases of automated vehicles and AI-based hiring).
Although the properties are not sufficient for a system to be under meaningful human control, we deem them necessary from a design perspective: while a system developed to possess all these properties may still not be fully under meaningful human control, we believe that completely missing one of these properties would imply that the human-AI system is not under meaningful human control. Moreover, each property is non-binary and necessarily multidimensional. Consequently, improving the system to some extent according to one or more of the four properties will lead to better tracking or tracing, and therefore, more meaningful human control over that system. That said, defining “how much of each property is sufficient” in a given context would generally require a thorough qualitative and situated analysis.
Furthermore, these four properties in themselves do not immediately translate to concrete design guidelines; metrics, algorithms, and methodologies needed to implement the properties are context- and system-specific. Yet, the properties provide explicit anchors for connecting to existing frameworks and methodologies across the design and engineering domains.