Using formal methods to reason about taskload and resource conflicts in simulated air traffic scenarios
- 181 Downloads
In complex environments, like the modern air traffic system, interactions between human operators and other system agents can profoundly impact system performance. System complexity can make it difficult to determine all of the situations where issues can arise. Simulation and formal verification have been used separately to explore the role of humans in complex systems. However, both have problems that limit their usefulness. In this paper, we describe a method that allows interesting conditions related to human taskload and resource conflicts between agents to be discovered and evaluated in high fidelity through the synergistic use of formal verification and simulation. The core of this method is based on a formal modeling architecture that represents original, agent-based simulation constructs using computationally efficient abstractions that ensure the temporal and ordinal relationships between simulation events (actions) are represented realistically. Taskload for each agent is represented using a priority queue model where only a limited number of actions can be performed or remembered by a human at a given time. Resources affected by agent behaviors are associated with actions so that resources can be reasoned about at the action level. We discuss our method and its formal architecture. We describe how the method can be used to find taskload and resource conflict conditions through the use of formal, checkable specification properties. We then use a simple air traffic example to demonstrate the ability of our method to find interesting taskload and resource conflict conditions around a simulation trace. The implications of this method are discussed and directions for future work are explored.
KeywordsFormal methods Simulation Model checking Human–automation interaction Taskload Workload Mode confusion Air traffic control Agent-based simulation Autonomy
A number of problems are associated with human–automation interaction in complex systems [2, 17, 53]. High demands on human resources can lead to excessive taskload; work that fails to engage the human (such as monitoring) can lead to a loss of vigilance; and poorly designed interfaces may be incompatible or contradictory to the human operator’s task. Further, the complexity of a system can make it difficult for humans to track system state. This can lead to situations where the system does things the human does not expect or human actions result in unintended outcomes, both of which can lead to large-scale system disasters . All of these conditions can result in human error and system failures. As systems become more autonomous, these types of problems are expected to become worse . In this work, we are predominantly concerned with issues related to taskload and concurrency issues that can lead to human operator confusion.
Human operator taskload is a measure of the number of tasks a human operator is expected to perform at a given time [43, 44]. Taskload has shown itself to be a good indicator of human operator mental workload (the total mental effort being expanded by a person at a given time) in the air traffic system [25, 29, 43, 44, 60]. In this sense, it is important that participants in this system avoid exorbitant levels of taskload to prevent human error and reduced performance. However, determining when taskload can become excessive and what the performance implications of that taskload are can be very challenging because of the many different people, machines, and environmental conditions that can interact during their operation. However, due to the many different operating conditions that can occur, experiments and tests with real systems and human subjects are infeasible.
In complex systems that contain humans, physical elements, and computational elements (agents) in a distributed environment, confusion about the value of shared resources can have profound implications for system safety. Specifically, if the human cannot keep track of the system state, he or she may behave erroneously or be surprised by system behavior and behave unpredictably [41, 52]. These types of concurrency problems can be thought of in terms of resources. Specifically, in systems with concurrency, agents can often have shared resources (variables and/or data) that they can access and/or modify. If a resource is modified concurrently by two agents, or modified by one agent while others are accessing it, a human agent may become confused and behave erroneously because a resource value does not match his or her expectations.
To address these problems, researchers have been building simulation environments such as Work Models that Compute (WMC)  that allow human operator taskload and system performance to be analyzed in a variety of air traffic simulations. Such simulations are more flexible than human subject experiments and real world tests. They are also able to represent complex system concepts in high fidelity. However, they follow a traditional experimental approach and are thus not exhaustive and can still miss potentially dangerous or performance-critical operating conditions that did not happen to occur in one of the explored scenarios. Researchers have also been exploring how formal verification  such as model checking  (a means of performing formal verification based on exhaustive/complete searches of systems models) can be used to prove whether or not concurrency issues can produce confusing or dangerous conditions in systems that rely on human–automation interaction [7, 61]. By virtue of using proofs, these analyses are exhaustive and will thus not miss conditions that can manifest in the evaluated models. However, this completeness comes at the expense of scalability, which can severely limit what systems can be analyzed.
In this work, we strove to develop a new method for using simulation and formal verification synergistically in a way that exploits the advantages of both approaches. Specifically, this approach gives analysts the ability to use model checking’s exhaustive search capabilities to explore the region around a simulated air traffic scenario to find excessive human taskload conditions and resource conflicts that may have previously been undiscovered. Discovered conditions can then be more deeply explored using the high-fidelity simulation analysis. By exploiting each method’s strengths, this work makes a significant contribution in that it allows formal analyses to be used with systems for which they were not previously possible. In this paper, we describe how this method was realized. First, we discuss the background necessary to understand our method. We then discuss the formal modeling architecture we developed for encapsulating WMC concepts. We describe the specification properties that we can use with the model to generate traces. Finally, we demonstrate the capabilities of our method by showing how verification results can be used to create simulation scenarios for an air traffic application.
Work Models that Compute (WMC) is a simulation framework that dynamically models complex, multi-agent concepts of operations and work domains . WMC attempts to model the collective work of a set of agents . It consists of two parts: a work model that describes the work of a given domain, and an engine that simulates the work model . Each work model comprises three primary elements: agents, actions, and resources. Resources are defined as a collection of specific elements of the work environment which can be sensed and manipulated by the agents. Actions manipulate resources, are linked to a specific agent, and represent the work at its most atomic unit. The work model specifies each action’s frequency, priority, duration of resources it needs or manipulates, and which agents are involved . Agents serve the dual purpose of organizing actions and adding a layer of dynamics to the prescribed action sequence by placing limits on both the number of simultaneously performed actions and their priorities .
A scenario pulls all elements of work models, agents, actions, and resources into a simulation. This can be used to generate an action trace and other higher-level metrics of interest. The simulation engine works on a hybrid timing mechanism that allows WMC to incorporate features of both continuous time and event-based simulation. This enables WMC to simulate dynamic systems (such as aircraft dynamics) and event-based agents (such as pilot models) [28, 48].
WMC provides an excellent framework for the work discussed in this paper. It allows for the modeling of humans and automated agents, while accounting for the resources each requires and modifies over the course of their work. It also accounts for human operator taskload. Further, while traditional simulations represent actions as if they are the sole responsibility of specific agents, WMC assigns actions to the work model. Actions are then executed by whichever agents are available. This allows for a full investigation of work as a fully defined object.
2.2 Modeling taskload in aerospace systems
The current version of WMC does not support the ability to evaluate human operator taskload. However, earlier simulations have researched this capability using the concepts around which WMC is built [43, 44]. In this formulation, each modeled human agent has two priority queues: one representing actions that are active (currently being executed) and one representing actions that are inactive. Inactive actions can have two designations. Those that have never been executed are designated as delayed, and actions that were previously active but are now inactive are designated as interrupted. The active queue has a limited capacity which results in actions transitioning between queues. If a human agent is assigned new actions, those actions are put in the inactive queue and given the delayed designation. If there is room in the active queue, the highest priority actions (those with the highest explicit priority with the shortest execution time as determined by the action’s resources) are moved to the active queue. If there are active actions with lower priorities than those in the inactive queue, those lower-priority actions are moved to the inactive queue and designated as delayed, and the higher priority actions are moved up into the active queue. As actions are finished, they are removed from the active queue and rescheduled for later (if they occur again) or to never occur again. Within this infrastructure, taskload can be described as the complete set of actions in all of an agent’s queues. While this approach to taskload has its limitations , it has proven itself useful in a number of aerospace applications [25, 29, 43, 44, 60].
2.3 Formal verification
Formal verification is an analysis technique that falls within the discipline of formal methods. Formal methods are well-defined mathematical languages and techniques for the specification, modeling, and verification of systems . Specification properties mathematically describe desirable system conditions. Systems are modeled using mathematically based languages, and verification then mathematically proves whether or not the model satisfies the specification. Model checking is an automated approach to formal verification , whereby a formal model describes a system as a state transition model: a set of variables and transitions between variable states. Desirable specification properties are usually represented in a temporal logic . Verification is performed automatically by exhaustively searching a system’s state space to determine whether these properties hold. If they do, the model checker returns a confirmation. Otherwise, a counterexample is produced. This shows how the specification violation occurred as a trace through the state space of the model.
Formal verification has been successfully used to evaluate human–automation interaction in a number of different capacities [7, 61]. Most relevant to this discussion is the research that has focused on workload and mode confusion.
There is a significant body of literature that focuses on using formal methods to find potential mode confusion in human–automation interaction . However, this work has predominantly used models of human-machine interfaces [12, 13, 14, 34, 36], concurrently executing human mental models and automation models [11, 20, 21, 50, 51, 54], or concurrently executing human task knowledge and automation models [8, 9, 10, 62]. As such, these analyses have almost exclusively focused on systems with single human operators and have not accounted for the interaction of multiple humans and automated agents in a distributed system.
Comparatively fewer papers have explored how human workload and taskload can be considered in formal methods. The sole contributions have come from Mercer and Goodrich et al. [40, 56], who have researched ways of formally modeling workload. However, they have not used these methods in formal verification analyses.
No matter the analysis subject, formal verification techniques like model checking suffer from combinatorial explosion, where the state space grows exponentially as additional components are added to the model . This can lead to models that are too big or take too long to be verified. Because of this scalability limitation, complex system behaviors must often be represented abstractly. This can limit the types of system behaviors and interactions that can be considered in a given analysis.
2.4 Formal verification and simulation in combination
3 Our method
In this work, we attempted to develop a method that gives analysts the ability to use model checking’s exhaustive search capabilities to explore the region around a simulated air traffic scenario to find excessive human taskload conditions and/or resource conflicts that may not have been discovered otherwise. We further wanted to give analysts the ability to analyze the discovered scenario condition more deeply using the high-fidelity air traffic simulation.
A WMC work model (which describes the agents in a simulation, along with the actions they perform and the resources they modify) and a scenario (which describes the initial conditions that represent a specific air traffic situation and future events that can occur) are run through a WMC simulation. The simulation produces a trace showing exactly how that scenario evolved.
The work model, scenario, and simulation trace are then translated into a formal model representing the simulation over a constrained period of time (a time window). In WMC, the time at which actions occur and the duration of those actions are critical to the manifestation of taskload. Thus, in the formal model, the timing of actions (when they occur and for how long they occur) is explicitly represented. This representation can include analyst-defined variance, nominally on the order of 1–3 s, to allow the model checker to explore the performance space around the modeled scenario. The translator also generates a set of specification properties designed to find interesting taskload conditions in the model.
A model checker is used to explore the formal model to generate traces illustrating violations of specifications.
The traces are then translated back into WMC scenarios for deeper analyses in its simulation environment.
3.1 Formal modeling architecture
Variables that define the action data type
A unique action identification as an integer from 1 to N, where there are N total actions
The identification of the agent responsible for the action
The priority queue location and status of the action: whether it is in active, delayed, or interrupted, or notAssigned (the state of an action that has yet to be assigned or has been finished and rescheduled)
The priority level of the action as a bounded integer
The time left for the action to finish executing (initial times are determined by the time it takes to set an action’s resources in the original WMC model)
The next time the action will be assigned (this is determined by the observed timings in the simulation trace)
The architecture for accomplishing this is shown in Fig. 2. In this, the formal model is represented by three synchronously composed modules: a Scheduler, Actions, and Agents. The Scheduler keeps track of modeled time, determines when actions are assigned to agents, and coordinates the behavior of the other modules based on the scheduler’s status. The Actions module is actually a collection of synchronously composed action submodules that represent actions from the WMC simulation. Similarly, Agents is a collection of synchronously composed agent submodules that represent agents from the WMC simulation.
The Actions and Agents modules communicate with each other via arrays of action and agent data types (actions and agents from Fig. 2). Each action and agent is associated with an instance of its respective data type. The action data type contains all of the variables shown in Table 1, while the agent data type is defined by the variables in Table 2.
Our architecture abstracts away the WMC concept of resources in service of computational efficiency. However, what resources are being modified at any given time can be inferred from the actions that are executing (information that is readily available from the WMC model and scenario).
The following describes the details of each of the elements in our architecture.
The Scheduler module is responsible for maintaining the clock and communicating the globalTime to the other modules. It also indicates when notAssigned actions with update times at the current globalTime are ready to be executed via the assignment variable. Finally, the Scheduler uses its status to coordinate the behavior of the other modules.
Variables that define the agent data type
A unique agent identification as an integer from 1 to M, where there are M total agents
The agent’s active priority queue capacity
The number of active actions the agent is responsible for (the number of actions in the agent’s active priority queue)
The active action the agent is responsible for that has the minimum priority (smallest priority and longest time) of all such actions
The delayed or interrupted action (an action in the inactive queue) the agent is responsible for that has the maximum priority (greatest priority and shortest time) of all such actions
The Agents module is a composition of synchronously composed agent submodules, where each agent manages the values of its corresponding agent data type. Conceptually, each agent is responsible for keeping track of the number of its active actions (activeCount). It also provides information each action will need for moving between execution states (moving between priority queues) in the form of its minActive and maxInactive variables.
To compute activeCount, an agent submodule uses the formula in (2) with \( actionState = active \). In the formal model, the code for doing this operation is automatically generated with a known bound on the number of possible actions (N). This operation is therefore linear and scales efficiently. It is also important to note that this same equation (2) can be used to compute the number of actions that are in the delayed and interrupted priority queues, even though these are not explicitly represented in the formal model.
To compute minActive and maxInactive, the agent first uses \(\lambda \) calculus to compute sets containing all of the action ids that satisfy the minimum active and maximum inactive criteria (minActiveSet and maxInactiveSet, respectively). The minActiveSet is computed as shown in (3) where, for all action ids i in a set of actionIDs, i is in the set if the action with \( id = i\) is active, associated with the given agent, and has a priority less than or equal to all other active actions associated with the agent. maxInactiveSet is computed as shown in (4) where, for all action ids i in a set of actionIDs, i is in the set if the action with \( id = i\) is interrupted or delayed, associated with the given agent, and has a priority greater than or equal to all other delayed or interrupted actions associated with the agent. With these sets computed, the action’s minActive and maxInactive are selected from the action ids in minActiveSet and maxInactiveSet, respectively. This allows for non-determinism in what actions will ultimately be active or inactive at any given time if the actions have the same priority.
If status is doing and that action’s state is active, then the action’s time is decremented based on the amount elapsed since the clock was last updated. If doing this means that the action has finished (that time becomes 0), the action’s state is set to notAssigned and its update time is set to the action’s next update time from the original simulation trace. To add non-determinism to the timing of actions, variance can be included in the update time.
If status is assigning and the action is in the set of assigned actions, then the action’s state is set to delayed and the action’s time is updated. Non-deterministic amounts of time variance can also be added in this assignment.
- If status is shuffling, then:
If the action’s state is delayed or interrupted and it is equal to its agent’s maxInactive action and either the action has a higher priority than its agent’s minActive action or its agent’s active capacity has not been reached, then the action’s state is set to active.
If the action’s state is active and equal to its agent’s minActive action, and the agent’s active capacity has been exceeded, then the action’s state is set to interrupted.
3.2 Specification properties and analysis capabilities
Our architecture gives us the ability to model sections of simulation traces with included variance in the timing of actions. This is useful because it allows us to reason about taskload and resource conflicts in specification properties that allow us to assert the absences of specific conditions we are interested in generating traces to explore. Specifically, with model checking, we can then use these properties to generate counterexamples showing exactly how the conditions of interest occur. As such, this enables us to use our method (Fig. 1) to create WMC scenarios to examine the conditions found in the counterexample in the simulation. For our current purposes, we are interested in specifications that concern taskload conditions and potential resource conflicts that can result from concurrency between agents.
We have implemented our method (Fig. 1) as a desktop computer application. The program automatically process WMC scenarios, generates the formal model and specification properties, and converts counterexamples result obtained from verifying the properties with a model checker back into a WMC scenario.
In this implementation, a user will first run the WMC simulation to develop a simulation trace. The application then processes the WMC simulation, its scenario, and the simulation trace. The application displays the simulation trace to the analyst who then selects a time frame (a span of time defined by a minimum and maximum) that will be the focus of the model checking analysis. Once the time frame is selected, the application automatically parses the relevant portion of the trace to identify: (a) All of the actions performed during their time frame; (b) The agents associated with the actions; and (c) And all of the resources that are gotten by or set be each of the actions.
The analyst then reviews the list of agents to identify their type (human or basic). If an agent is basic, it represents an automated element of the system. It is thus given an “unlimited” active priority queue (a queue with a maximum equal to the total number of actions in the analysis). If an agent is human, then the analyst can set the size of the agent’s active and inactive queues. The analyst can also remove agents from the analysis. This can be a useful feature if automated agent behavior is not a factor in the analysis and the analyst wants to help control for scalability.
When agent editing is completed, the program will modify the list of available actions to remove any actions associated with excluded agents. An analyst can then edit the priorities assigned to actions, the amount of time they take to perform, as well as set variance on action update times and performance times.
When this is completed, the program generates a model checker input file based on the formal modeling architecture (Fig. 2) in the notation of the Symbolic Analysis Laboratory (SAL)  for use with its infinite bounded model checker .2 In this file, generated versions of the specifications from Sect. 3.2 are created. Specifications based on FindActiveLoad, FindInactiveLoad, FindOverload, FindNoLoad, and FindNoOverload [Eqs. (5)–(9)] are generated for each agent included in the analysis. A FindResourceConflict property [Eq. (10)] is generated for each pair of actions where a get–set or set–set conflict could occur.
Once the desired specification has been checked with the model checker and a counterexample produced, an analyst can process the counterexample with our application. This analyzes the counterexample and identifies what time each of the actions needs to be performed at in the simulation to achieve the behavior from the counterexample. The application then generates WMC scenario code (C++), which is used to modify the original WMC scenario. This reschedules each of the actions to achieve the new behavior. The WMC simulation can then be rerun to analyze the impact of this behavior. If desired, an analyst can iteratively use our method with a simulation scenario to progressively investigate a scenario over its entire run.
To test our method, we wanted to show first we could use our different classes of specification properties to find the intended simulation conditions. Second, we wanted to show that we could use counterexamples generated with the method to produce the desired behavior in the simulation. For these tests, we use an air traffic control application. Specifically, we employ scenarios from AAR Studies (Authority, Autonomy, and Responsibility Studies; a series of simulations) [32, 33]. These scenarios simulate three aircraft attempting to land on runway RWY18R at the Amsterdam Schiphol Airport under the supervision of an air traffic controller with different distributions of authority and responsibility between the aircraft, the aircraft pilots, and the air traffic controller. For the purpose of the test reported here, we examined a balanced distribution (termed FA3-FA3), which defined a split of responsibility between both parties. This maintains the idea that while both ground and aircraft have their own responsibilities, they maintain full authority for their responsibilities. It also ensures that there is no mismatch between who is completing the action and who is responsible for the success of the action. This ensures our tests are only concerned with actions associated with the landing tasks as opposed to actions associated with the monitoring of other agents’ behaviors.
When this simulation scenario was originally run through WMC, it generated a trace with 199,204 simulation actions that occurred over 676.33 s of simulated time. Our testing was concerned with determining that our new method was capable of analyzing and modifying WMC traces, and not necessarily analyzing an entire WMC simulation. Thus, the tests only focused on analyzing the first 5 s of the original simulation scenario, checking properties against a formal model of that period, and determining that the modified scenario replicated the behavior found in the counterexamples generated from the checking of the properties. Note that 5 s was chosen for this analysis because, when the original trace was examined, actions tended to cluster together. The first 5 s of the simulation encapsulated the first cluster of actions. This involved the air traffic controller managing all three aircraft’s progress to waypoints and clearing the aircraft for descent while the pilots direct their aircraft to the waypoint.
Although the time span considered in our test was small, the model had significant complexity from the other elements of the system. Specifically, our analyses included all four human agents from the original scenario: the pilots from aircraft 1, 2, and 3 as well as the air traffic controller. Further, in this analyzed segment, there were nine human actions: six for the air traffic controller, and one for each pilot. The air traffic controller had separate actions for managing the progress of each aircraft to the waypoint and for clearing each aircraft for descent. Each aircraft pilot was capable of directing his or her aircraft to the waypoint.
Because the original scenarios did not contain priority queue limit information, these had to be assigned for the formal model. For this, each agent was given an active queue with a capacity of two and an inactive queue with a capacity of three (giving each agent a memory capacity of seven plus or minus two ). The duration of actions were treated as if they would take one second each to perform.3 No timing variance was employed in our tests. However, the non-determinism that could result from similarities in priorities between actions was accounted for.
The SAL file created from the scenario contained generated properties for FindActiveLoad, FindInactiveLoad, FindOverload, FindNoLoad, and FindNoOverload [Eqs. (5)–(9)] for each agent. Three FindResourceConflict specifications using Eq. (10) were generated (all based on potential get–set conflicts): one for each aircraft, where the conflict occurs when the air traffic controller’s actions for managing an aircraft’s progress toward the waypoint and the pilot’s action for directing the corresponding aircraft to the waypoint occur at the same time.
Simulations and verifications were performed on a computer workstation with a 3.6 gigahertz Intel Xeon processor with 128 gigabytes of RAM. Verifications were run on the Linux Desktop using SAL’s infinite bounded model checker . SAL’s infinite bounded model checker requires a depth (a bound) on the number of transitions considered in a given analyses. For large bounds, verification can take a significant amount of time. Thus, for all verifications, depths were started out at 10 and were iteratively increased by 5 until either a counterexample was found or the analyst was satisfied that the property would not be violated.
4.1 Finding counterexamples
The generated specifications have three distinct forms: those for finding instances of load [derived from Eqs. (5)–(7)], those for finding instances of a lack of load [based on Eqs. (8) and (9)], and those for finding resource conflicts [created from Eq. (10)]. Thus, while every property type has been tested for its efficacy, the results discussed here focus on one property of each form. Further, because the pilots in the given modeled section only had one action each, it made little sense to consider issues related to their load. Thus load-based specifications were only checked for the air traffic controller.
To start, we attempted to check whether the air traffic controller could ever be overloaded [by checking Eq. (7)]. When verification was performed, a counterexample was found at depth 10 (verification time: 16.56 s). This showed that overload could occur in the scenario as originally represented in the WMC scenario.
To find a condition where overload could be avoided, we reran the verification with Eq. (9). This produced no counterexample at depths 10 (verification time: 5.45 s) and 15 (verification time: 85.81 s). However, at depth 20 (verification time: 479.74 s), a counterexample was produced. This showed how actions could be spaced out so as to avoid air traffic controller overload. This scenario is explored in more depth in the subsequent section.
Finally, we attempted to check whether the get–set resource conflict could occur for aircraft one, where the conflict occurs when the air traffic controller’s actions for managing an aircraft’s progress toward the waypoint and the pilot’s action for directing the corresponding aircraft to the waypoint occur at the same time. This produced a counterexample at depth 10 (verification time: 10.7 s). This showed that these two actions could produce a resource conflict in the original scenario.
Collectively, these tests demonstrate the methods ability to identify specification violations and thus generate counterexamples. These counterexamples can be used to create new scenarios.
4.2 Modifying the simulation scenario with counterexamples
Next, we wanted to test that our method could accurately be used to generate a new scenario illustrating the action sequence from a counterexample. Because only the specification that generated a counterexample with no overload [Eq. (9)] represented an action sequence that was different from the original scenario, this was used. Thus, we used our automated method to create a new scenario (a modified version of the original) where the actions were spaced out as specified in the counterexample. A manual examination of the resulting action trace revealed that the rescheduled actions were being performed at the times specified in the counterexample and that these timings avoided overload.
5 Conclusions and discussion
In this paper, we described a formal modeling architecture designed to enable the discovery of interesting human operator conditions that relate to human taskload and resource conflicts between agents though the synergistic use of formal verification and simulation. The presented method allows us to formally represent all of the relevant WMC concepts while satisfying our objectives. (a) It uses timed automata to represent real-valued time and allows for sensitivity analyses of WMC concepts based on variance in the timing of actions; (b) it allows taskload to be modeled by having the model reason over an array of actions, each with its own state and associated with a different agent; and (c) by using \(\lambda \) operations over the set of actions and by abstracting away WMC details unimportant to the formal analyses, the architecture is computationally efficient. A number of specification properties can be used to reason about taskload for use in generating counterexamples. Finally, we showed how these specification properties could be used to find scenarios to drive WMC simulations to interesting conditions.
Previous work that has used formal verification with simulation has either focused on analyzing limited aspects of a simulation [26, 27, 31, 65] or on only model checking the actual simulation traces [15, 57, 64]. As such, this work makes a significant contribution in that it introduces a new way of using simulation and formal verification synergistically by allowing model checking to be used to explore the space around simulation scenarios. This is a major breakthrough for several reasons. From a formal methods perspective, this approach allows formal verification to be used in the analysis of complex systems that would have been too big and/or complex to be evaluated in the past. From a simulation perspective, our method allows for a much more complete analysis of the space around a simulation trace than was previously possible. This should allow analysts to use high-fidelity simulations to find potential problems that may not have been found otherwise.
While researchers have investigated how to formally model workload [40, 56], these have been used in simulation environments, not formal verification. Thus, our developments are significant because it constitutes the first effort to account for human operator taskload in formal verification analyses. By using lambda calculus sets as the means of modeling taskload, we have developed a representation that should be computationally tractable in a number of different situations. Given the importance that taskload and workload play in human performance and system safety, this advance will enable formal verification to be used for human factors engineering purposes in new and important ways.
The presented research focused primarily on the technical challenges of realizing our method. It thus neglects a full application of the method in a realistic application. There are also ways that the method could be improved. These issues are explored below.
An examination of the verification results shows that times increase exponentially with the search depths (see Fig. 4). This is a potentially limiting factor for our analyses as verifications could take so long to run that they would be prohibitive for use in a long, complex simulation scenario.
For properties that relate to taskload, each agent in our formal modeling architecture operates independently of the others. Thus, for these types of specifications, verifications for each agent could be run separately. The resulting new action times from each analysis can then be integrated into the new scenario.
Although not discussed extensively here, our method was designed to be used iteratively, where an analyst can: use the method to modify one portion of a scenario, generate a new scenario, run it, observe the effect, and apply the method on a latter part to further modify the scenario. In this way, small time windows can be used effectively to explore longer scenarios while avoiding scalability issues. Further, the iterative approach to the deployment of the method should allow the impact of minor change early in the scenario to be fully realized in subsequent scenario runs.
It is important to note that even with all of the agents included in the verification analyses, the longest verification time we observed was 479.47 s (Fig. 4). In our analyses, the average WMC run took \(\sim \) 650 s. Thus, though the formal verification analyses can take a significant amount of time, they are effectively considering an infinite number of simulation scenarios. As such, the dual use of formal verification and simulation can be viewed as having a scalability advantage in that model checking is allowing multiple scenarios to be explored more efficiently than could be achieved by only running multiple simulations. For example, for the analyses presented in this paper, the scenarios discovered from running the WMC simulation scenario and then applying our model checking process required less computational time than it would have taken to run two full simulation runs. If only simulation were being employed, many more simulations would be required to find the same types of conditions, if they were ever found at all.
Because the exact timing of actions was critical to the way that taskload was computed (this would be even truer if variance in action timing was included), we modeled it explicitly in our method. However, this is likely a major factor in the verification times reported in the previous section. It is conceivable that abstraction techniques could be used to remove the need to represent real time from the model and thus enable the use of more traditional symbolic model checking. In fact, automated techniques exist for learning such abstractions from hybrid models that contain both discrete and continuous components [58, 59]. Future work should explore how abstraction techniques could be used to potentially improve the scalability of our approach.
5.3 Improvements to the translation processes
While automated, the translation processes still require some manual analyst intervention to identify the sizes of priority queues (if they are not part of the WMC model), timing variances, and priorities. Further, when translating a counterexample back into a WMC scenario, our method will automatically create WMC code that must be manually placed in a WMC scenario cpp file, and the WMC C++ source must be recompiled. In future work, we hope to rely less on these manual human operations. To accomplish this, we hope to use XML descriptions of the work models and scenarios. The XML Specification would create a universal starting point for modeling new systems of interest and allow rapid changes to existing work models. Having such an abstracted specification would support collaborators or general users who are unfamiliar with WMC or SAL but more familiar with XML markup. Additionally, an XML Specification would allow XML files of work models and scenarios to be translated back and forth between WMC simulations and SAL experiments without manual placing of source code and WMC compilation. The XML Parser currently translates work model and scenario XML files into WMC and WMC scenarios into XML. Current XML files are parsed and XML statements are subsequently generated and populate the WMC template. We envision the capability for WMC to XML Specifications to parse WMC C++ files and dynamically generate XML tag statements during runtime to populate XML template files.
5.4 Additional sources of variance
Our method only allows for variance of action timing and sequencing due to non-determinism in action prioritization. This allowed us to use formal verification to test the taskload and resource conflict properties we were interested in. However, it does not allow for variations in the actions that are performed due to human error or changes in the environmental conditions. These could also impact taskload and resource conflicts by introducing additional actions or time constraints. Work within the extended formal methods literature has focused on how human error and anomalous system conditions can be generated in formal models so that verifications can assess their impact on system performance [3, 4, 5, 6, 7, 42]. Future work should investigate how these approaches could be adapted for use in our method.
5.5 Expanded application
The AAR Studies application presented here showed that our method was capable of finding the desired conditions around simulation traces and generating new simulation scenarios from those results. While the total amount of time represented in the model was small (5 s), the method was able to handle scenarios with 4 separate agents with 9 possible actions between them and produce results in a reasonable amount of time.
While the results presented here were capable of testing the intellectual contributions of our work, the analyses do not give us any particular insights into the larger performance of the represented air traffic control scenario. Further, the balanced FA3-FA3 distribution we tested is only one of many possible distributions of authority and responsibility offered by the AAR Studies scenarios. Future work with the method will explicitly focus on using the new capabilities of the method to more deeply and completely explore how taskload and resource conflicts could impact air traffic control performance under different distributions of authority and responsibility.
Note that specification property patterns are presented with a name, followed by a \(\models \), followed by the specification property logic.
SAL was chosen for this work because of the expressiveness of input language . This included the ability to model real time, its support of both synchronous and asynchronous composition of modules (though only synchronous composition was ultimately used), and its inclusion of lambda operations.
Note that these timings are not necessarily realistic. The presented tests were concerned with demonstrating the capabilities of our method, not with the realism of the air traffic scenario. More realistic scenarios will be the subject of future work. See Sect. 5.
This work was supported by the grant “Scenario-Based Verification and Validation of Autonomy and Authority” from the NASA Ames Research Center under Award Number NNX13AB71A.
- 9.Bredereke J, Lankenau A (2002) A rigorous view of mode confusion. In: Proceedings of the 21st international conference on computer safety, reliability and security. Springer, London, pp 19–31Google Scholar
- 11.Buth B (2004) Analyzing mode confusion: an approach using FDR2. In: Proceeding of the 23rd international conference on computer safety, reliability, and security. Springer, Berlin, pp 101–114Google Scholar
- 12.Butler RW, Miller SP, Potts JN, Carreño VA (1998) A formal methods approach to the analysis of mode confusion. In: Proceeding of the 17th digital avionics systems conference. IEEE, Piscataway, pp C41/1–C41/8Google Scholar
- 14.Campos JC, Harrison MD (2011) Modelling and analysing the interactive behaviour of an infusion pump. In: Proceedings of the fourth international workshop on formal methods for interactive systems. EASST, PotsdamGoogle Scholar
- 15.Chen X, Hsieh H, Balarin F, Watanabe Y (2003) Automatic trace analysis for logic of constraints. In: Proceedings of the design automation conference. IEEE, pp 460–465Google Scholar
- 16.Clarke EM, Grumberg O, Peled DA (1999) Model checking. MIT Press, CambridgeGoogle Scholar
- 17.Committee on Autonomy Research for Civil Aviation, Aeronautics and Space Engineering Board, Division on Engineering and Physical Sciences, National Research Council (2014) Autonomy research for civil aviation: toward a new era of flight. National Academy of Sciences, WashingtonGoogle Scholar
- 18.de Moura L, Owre S, Shankar N (2003) The SAL language manual. Technical report CSL-01-01. Computer Science Laboratory, SRI International, Menlo ParkGoogle Scholar
- 19.De Moura L, Owre S, Rueß H, Rushby J, Shankar N, Sorea M, Tiwari A (2004) Sal 2. In: International conference on computer aided verification. Springer, pp 496–500Google Scholar
- 20.Degani A (2004) Taming HAL: designing interfaces beyond 2001. Macmillan, New YorkGoogle Scholar
- 22.Derrick J, North S, Simons T (2006) Issues in implementing a model checker for Z. In: International conference on formal engineering methods. Springer, pp 678–696Google Scholar
- 23.Dutertre B, Sorea M (2004) Timed systems in SAL. Technical report NASA/CR-2002-211858, SRI InternationalGoogle Scholar
- 24.Emerson EA (1990) Handbook of theoretical computer science. In: van Leeuwen J, Meyer AR, Nivat M, Paterson M, Perrin D (eds) Temporal and modal logic, vol 16. MIT Press, Cambridge, pp 995–1072Google Scholar
- 25.Feigh KM, Pritchett AR, Mamessier S, Gelman G (2014) Generic agent models for simulations of concepts of operation: part 2. J Aerosp Inf Syst 11:623–631Google Scholar
- 26.Gelman G, Feigh KM, Rushby J (2013) Example of a complementary use of model checking and agent-based simulation. In: IEEE international conference of systems man and cybernetics. IEEE, Piscataway, pp 900–905Google Scholar
- 28.Gelman GE (2012) Comparison of model checking and simulation to examine aircraft system behavior. Ph.D. thesis, Georgia Institute of TechnologyGoogle Scholar
- 30.Houser A, Ma LM, Feigh K, Bolton ML (2015) A formal approach to modeling and analyzing human taskload in simulated air traffic scenarios. In: 2015 International conference on complex systems engineering, pp 1–6Google Scholar
- 31.Hu AJ (2008) Simulation vs. formal: absorb what is useful; reject what is useless. In: Proceedings of the third international Haifa verification conference. Springer, Berlin, pp 1–7Google Scholar
- 32.IJtsma M, Hoekstra J, Bhattacharyya RP, Pritchett A (2015) Computational assessment of different air-ground function allocations. In: Eleventh USA/Europe air traffic management research and development seminarGoogle Scholar
- 33.IJtsma M, Pritchett AR, Bhattacharyya RP (2015) Computational simulation of authority–responsibility mismatches in air-ground function allocation. In: Proceedings of the 18th international symposium on aviation psychology. Write State University, DaytonGoogle Scholar
- 34.Joshi A, Miller SP, Heimdahl MP (2003) Mode confusion analysis of a flight guidance system using formal methods. In: Proceedings of the 22nd digital avionics systems conference. IEEE, Piscataway, pp 2.D.1-1–2.D.1-12Google Scholar
- 40.Moore J, Ivie R, Gledhill T, Mercer E, Goodrich M (2014) Modeling human workload in unmanned aerial systems. 2014 AAAI spring symposium series. AAAI, Palo Alto, pp 44–49Google Scholar
- 41.Palmer E (1995) “Oops, it didn’t arm”—a case study of two automation surprises. In: Proceedings of the 8th international symposium on aviation psychology. Wright State University, Dayton, pp 227–232Google Scholar
- 42.Pan D, Bolton ML (2016) Properties for formally assessing the performance level of human–human collaborative procedures with miscommunications and erroneous human behavior. Int J Ind Ergon. doi: 10.1016/j.ergon.2016.04.001
- 43.Popescu V, Clarke J, Feigh KM, Feron E (2011) ATC taskload inherent to the geometry of stochastic 4-d trajectory flows with flight technical errors. CoRR abs/1102.1660Google Scholar
- 44.Pritchett A, Feigh K (2011) Simulating first-principles models of situated human performance. In: Proceedings of the IEEE first international multi-disciplinary conference on cognitive methods in situation awareness and decision support. IEEE, Piscataway, pp 144–151Google Scholar
- 45.Pritchett AR (2013) The Oxford handbook of cognitive engineering. In: Lee JD, Kirlik A (eds) imulation to assess safety in complex work environments, vol 22. Oxford University Press, New York, pp 352–366Google Scholar
- 46.Pritchett AR, Feigh KM, Kim SY, Kannan SK (2014) Work models that compute to describe multiagent concepts of operation: part 1. J Aerosp Inf Syst 11(10):610–622Google Scholar
- 51.Rushby J, Crow J, Palmer E (1999) An automated method to detect potential mode confusions. In: Proceedings of the 18th digital avionics systems conference. IEEE, Piscataway, pp 4.B.2-1–4.B.2-6Google Scholar
- 54.Sherry L, Feary M, Polson P, Palmer E (2000) Formal method for identifying two types of automation-surprises. Technical report. C69-5370-016, Honeywell, PhoenixGoogle Scholar
- 55.Smith G, Wildman L (2005) Model checking Z specifications using SAL. In: ZB 2005: formal specification and development in Z and B. Springer, pp 85–103Google Scholar
- 56.Stocker R, Rungta N, Mercer E, Raimondi F, Holbrook J, Cardoza C, Goodrich M (2015) An approach to quantify workload in a system of agents. In: Proceedings of the 14th international conference on autonomous agents and multiagent systems. IFAAMAS, LiverpoolGoogle Scholar
- 58.Tiwari A (2003) HybridSAL: modeling and abstracting hybrid systems. Technical reportGoogle Scholar
- 59.Tiwari A, Khanna G (2002) Series of Abstractions for Hybrid Automata. In: Tomlin CJ, Greenstreet MR (eds) Proceedings of the 5th International Workshop on Hybrid Systems: computation and control. Springer, Berlin, Heidelberg, pp 465–478. doi: 10.1007/3-540-45873-5_36
- 61.Weyers B, Bowen J, Dix A, Palanque P (eds) (2017) The handbook of formal methods in human–computer interaction. Springer, ChamGoogle Scholar
- 62.Wheeler PH (2007) Aspects of automation mode confusion. Master’s thesis, Massachusetts Institute of Technology, CambridgeGoogle Scholar
- 63.Wing JM (1990) A specifier’s introduction to formal methods. Computer 23(9):8, 10–22, 24Google Scholar
- 64.Yasmeen A, Feigh KM, Gelman G, Gunter EL (2012) Formal analysis of safety-critical system simulations. In: Proceedings of the 2nd international conference on application and theory of automation in command and control systems. IRIT Press, pp 71–81Google Scholar
- 65.Yuan J, Shen J, Abraham J, Aziz A (1997) On combining formal and informal verification. In: Grumberg O (ed) Proceedings of the the 9th International Conference on Computer Aided Verification. Springer, Berlin, Heidelberg, pp 376–387. doi: 10.1007/3-540-63166-6_37