Keywords

1 Introduction and Problem Description

The domain of military helicopter missions is characterized by complexity and uncertainty, especially if the pilots command several unmanned aerial vehicles (UAVs) from the cockpit of a manned helicopter (manned-unmanned teaming MUM-T) [1, 2]. During their mission, the pilots must perform many cognitive tasks that require different amounts of mental resources. While processing those tasks, the workload of the pilots can widely change with different task situations. Especially in situations, in which the pilots are overloaded, decrements in human performance might occur. These decrements in human performance can have a negative impact on the total performance of a helicopter mission.

To counteract performance decrements in high workload situations, automation is often used to reduce mental workload (MWL) of the operator of a technical system. But the use of automation is not carefree and human factors problems play the central role. Among them are deskilling of the operator [3], boredom, complacency effects and clumsiness of the automation [4, 5], mode errors and mode confusion [6], loss of situation awareness [7], as well as design factors like complexity, brittleness, opacity and literalism [8].

To tackle those human factors problems in the domain of aviation, cognitive automation and cognitive associate systems have been developed over the years to support the human pilots [9,10,11,12,13]. In order to keep the pilots in the loop and to not disturb the work process, this support should adapt to the tasks the pilot is currently conducting. Furthermore, the goal of adaptive automation is to keep the operators’ workload at an average level to maintain human performance in extreme workload situations [14, 15]. Workload-adaptive associate systems use adaptive automation to support the human operators.

According to [16], MWL is a multidimensional construct, which is determined by characteristics of the task, of the operator and, to a degree, the environmental context. Our current approach towards engineering a workload-adaptive associate system is given by [17, 18]. It is based on a context-rich description of MWL. A basic assumption in this approach is, that the MWL of the operators follows qualitatively the current task load induced by the task situation [19]. Therefore, a task-centered design has been proposed. With the application in an associate system in mind, in this concept the construct MWL is operationalized context-rich in the form of a plan (the tasks an operator has to do), the current activity, demands on mental resources associated with this activity and observable behavior patterns. The necessary knowledge is stored in a common task model. In order to derive mental workload, the mission has to be planned [20], the current activity determined, demands on mental resources estimated [13] and behavior patterns analyzed [21]. A requirement for determining MWL in a dynamic mission and specific task situation is a reliable determination of the current activity of a human operator.

In this task-centered approach, the human operators are communicating with artificial team members. According to [22], anticipatory information sharing by implicit deliberative communication can improve team performance in complex task situations. Our task-centered method provides a means of implicit, non-verbal deliberative communication, since the human pilots inform the artificial crew member implicitly about their tasks and therefore their goals. Our method for activity determination lays a foundation for a better performance in mixed human and machine teams like the manned-unmanned teaming of a helicopter and UAVs. The following sections give deeper insights, how activity determination can be designed and implemented by using evidential reasoning.

2 Method

2.1 Requirements for Activity Determination

Activity determination can be regarded as a classification problem. If the current activity of a human operator is described by a set of tasks contained in a task model, the method must determine for every task, if the operator is currently conducting this task or not. For a successful and robust determination of the activity, different requirements must be met:

Activity determination must work in an environment, which is characterized by uncertainty and ignorance. Since there is no direct link into the human brain, measurement sensors like buttons, speech detectors or gaze trackers must be used to draw conclusions regarding the underlying cognitive processes. These sensors introduce uncertainty because each measurement has an individual measuring inaccuracy. A consequence is that generated hypotheses based on those measurements have limited reliability. Ignorance results from missing knowledge about model parameters and the environment. If a sensor is temporarily out of order, all measurements of this sensor are completely unreliably. This is for example the case for a gaze tracking system. Loss off gaze tracking can occur, if the test person is blinking or covering a camera. Not only the measurement equipment, but also the human behavior can be faulty. An easy example would be if a pilot forgets to set some system parameters like radio frequencies. Therefore, the algorithms must be able to deal with uncertainty and ignorance and weight the resulting hypotheses of the sensors according to their measurement accuracy. Since activity determination must work automatically in real-time as part of an associate system, these algorithms must also be fast enough. The cognitive processes inside the human brain run on a time scale of tens to hundreds of milliseconds [23]. Therefore, activity determination should also work on this time scale.

Methods that meet those requirements and which seem promising for activity determination are probability theory, Dempster-Shafer theory (DST) [24, 25] and certainty factors [26]. Because of its simplicity and mathematical foundation, probability theory, and especially Bayesian Networks [27], are suited for solving classification problems [28]. For example, Naïve Bayesian Classifiers can be used to filter junk email [29, 30]. Bayesian Networks model the reasoning process in the causal direction and use Bayes’ Rule to solve diagnostic problems. The disadvantage of using probability theory is that all model parameters (a-priori and conditional probabilities) must be determined from statistical analysis prior using the model. Certainty factors are a heuristic approach, where the parameters are obtained from expert knowledge. Wittig has used certainty factors for determining pilot intent [31]. DST is also a theory based on a mathematical framework. From a certain point of view, it can be regarded as an extension of probability theory. In contrast to probability theory, DST distinguishes between uncertainty and ignorance. Considering ignorance enables an associate system to be aware of its own lack of knowledge, which might be useful when it comes to critical decisions. In those situations, it is usually better if the human operator decides what to do, rather than a system with uncomplete knowledge. DST is a method for evidential reasoning and extends probability theory by providing a rule of combination for diagnostic problems. The difficulty in DST is the interpretation of the parameters and how they are acquired. Beside certainty factors, DST has also be applied in the MYCIN expert system for computer based medical consultations [32].

Because of providing a rule how to combine evidences, the ability to consider both uncertainty and ignorance and its mathematical foundation, we suggest to take DST for pilot’s activity determination. We are using a simplified version of DST based evidential reasoning because the drawback of the full theory is an exponential complexity in calculation and therefore difficult to implement in real-time [32].

2.2 Evidential Reasoning

Since the execution of tasks, especially cognitive tasks, cannot be directly observed, we are using an indirect method for classifying, whether a task is currently being executed by a pilot or not. The used evidential reasoning method is similar to the way a human observer would take (see Fig. 1). During the observation of the pilots, the technical system, and the environment, many observable facts are collected. Each of those facts proposes a hypothesis, whether an operator is currently executing a certain task or not. These single evidences are weighted according to their plausibility and combined into one single resulting hypothesis.

Fig. 1.
figure 1

Process of evidential reasoningin the domain of helicopter missions.

The underlying reasoning model is described in the Sects. 2.3 and 2.4. The processing chain from collecting observations to generate evidences is explained in Sect. 2.5. The inference algorithm for combining evidences to draw conclusions on the actual activity is explained in Sect. 2.6.

2.3 Representing Uncertainty and Ignorance

In probability theory, the strength of belief in a hypothesis \( X \) is represented by the scalar probability \( P(X) \). \( P(X) \) describes the belief in this hypothesis and \( 1 - P(X) \) the doubt. In contrast to probability theory, we represent the strength of belief in a hypothesis \( X \) by a belief triplet according to [33]:

$$ \varvec{Q}\left( X \right) = (p, q, r) $$
(1)

This triplet is a normalized distribution:

$$ p + q + r = 1 $$
(2)

The belief quantities \( p \), \( q \) and \( r \) can have continuous values in the range from \( 0 \) to \( 1 \). The quantity \( p \) is called belief and significates to which extent the hypothesis is supported, \( q \) the doubt, to which extent the hypothesis is rejected, and \( r \) the remaining ignorance. There are different interpretations of belief values. According to the interpretation of Dempster [24], the ignorance \( r \) describes uncertainty in quantifying probabilities, \( p \) is a lower probability value and \( 1 - q \) an upper probability value (plausibility), where the actual probability lies somewhere in between. Other interpretations do not try to relate belief values directly to probabilities. Shortliffe, for example, interprets belief values in certainty factor theory as an increase in information and gives the following example: “I don’t know what the probability is that all ravens are black, but I do know that every time you show me an additional black raven my belief is increased by X that all ravens are black” [26].

Similar to conditional probability tables, conditional belief distributions can be defined. These distributions are described in our method by matrixes:

$$ \varvec{M}_{S} \left( {B |A} \right) = \left( {\begin{array}{*{20}c} {p_{t} } & {p_{f} } & 0 \\ {q_{t} } & {q_{f} } & 0 \\ {r_{t} } & {r_{f} } & 1 \\ \end{array} } \right) $$
(3)

In this definition, the matrix components significate the following:

$$ \begin{aligned} & & \\ \begin{array}{*{20}c} {p_{t} :} & {{\text{Belief,}}\,{\text{that}}} & {\,B = true} & {\text{if}} & {A = true.} \\ {q_{t} :} & {{\text{Belief,}}\,{\text{that}}} & {\,\,\,B = false} & {\text{if}} & {A = true.} \\ {r_{t} :} & {{\text{Belief,}}\,{\text{that}}} & {\,\,\,\,\,\,\,\,\,\,B = unknown} & {\text{if}} & {A = true.} \\ {p_{f} :} & {{\text{Belief,}}\,{\text{that}}} & {B = true} & {\text{if}} & {A = false.} \\ {q_{f} :} & {{\text{Belief,}}\,{\text{that}}} & {\,\,\,B = false} & {\text{if}} & {A = false.} \\ {r_{f} :} & {{\text{Belief,}}\,{\text{that}}} & {\,\,\,\,\,\,\,\,\,\,B = unknown} & {\text{if}} & {A = false.} \\ \end{array} \\ \end{aligned} $$

2.4 State Space Model

We are using a separate classification model for every possible task that can be conducted by a test person. The state space model for a single task, which classifies whether it is part of the current activity or not, is depicted in Fig. 2.

Fig. 2.
figure 2

State space model.

The state space model is a structured representation of the dependencies between a task, evidences and observations. It consists of nodes and edges and is similar to a Bayesian Network. The nodes symbolize state variables described by belief triplets (1). The root node is the state variable for the examined task \( X \). Several evidence variables \( E_{1} , \ldots ,E_{m} \) are linked to the task \( X \). Since each evidence is based on a hypothesis generated by a measurement sensor, each evidence variable \( E_{j} \) is connected to a sensor variable (observation) \( S_{j} \). The edges in the state space model symbolize reasoning models described by conditional belief distributions (3). The arrows indicate the direction of belief propagation for diagnostic reasoning. There are two types of models: Sensor models and evidence models.

A sensor model describes the accuracy or reliability of a measurement sensor. Beside the general sensor model \( \varvec{M}_{S} \left( {E_{j} |S_{j} } \right) \) according to (3), we define some special types of a sensor model:

$$ \begin{array}{*{20}c} {\varvec{M}_{S} \left( {E_{j} |S_{j} } \right) = \left( {\begin{array}{*{20}c} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \\ \end{array} } \right)} & {\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,{\text{Perfect Sensor}}} \\ \end{array} $$
(4)
$$ \begin{array}{*{20}c} {\varvec{M}_{S} \left( {E_{j} |S_{j} } \right) = \left( {\begin{array}{*{20}c} 0 & 0 & 0 \\ 0 & 0 & 0 \\ 1 & 1 & 1 \\ \end{array} } \right)} & {\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,{\text{Unknown sensor}}} \\ \end{array} $$
(5)
$$ \begin{array}{*{20}c} {\varvec{M}_{S} \left( {E_{j} |S_{j} } \right) = \left( {\begin{array}{*{20}c} {P_{t} } & {P_{f} } & 0 \\ {1 - P_{t} } & {1 - P_{f} } & 0 \\ 0 & 0 & 1 \\ \end{array} } \right)} & {\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,{\text{Probabilistic sensor}}} \\ \end{array} $$
(6)
$$ \begin{array}{*{20}c} {\varvec{M}_{S} \left( {E_{j} |S_{j} } \right) = \left( {\begin{array}{*{20}c} Z & 0 & 0 \\ 0 & Z & 0 \\ {1 - Z} & {1 - Z} & 1 \\ \end{array} } \right)} & {\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,{\text{Sensor with scalar reliability}}} \\ \end{array} $$
(7)

For a perfect sensor, both, uncertainty and ignorance are \( 0 \). An unknown sensor model is the contrary of a perfect sensor model and will always result in ignorance \( 1 \) for any input. For a Bayesian definition of the sensor model with no ignorance, the probabilistic model is used and for representing reliability and the amount of knowledge, a simple sensor model with a single scalar value \( Z \) might be used.

The evidence model describes the strength of an evidence for task execution under the assumption, that there is no measurement error by the sensor (perfect sensor model). In contrast to the sensor model, the evidence model contains human factors like uncertainty in human behavior. A general model of an evidence can be written in matrix form \( \varvec{M}_{E} \left( {X |E_{j} } \right) \) according to Eq. (3). Beside a perfect, unknown and probabilistic evidence (compare with sensor models), we consider two special types of evidence models:

$$ \begin{array}{*{20}c} {\varvec{M}_{E} \left( {X |E_{j} } \right) = \left( {\begin{array}{*{20}c} p & 0 & 0 \\ 0 & 0 & 0 \\ {1 - p} & 1 & 1 \\ \end{array} } \right)} & {\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,{\text{Supporting evidence }}\left( {\text{belief}} \right)} \\ \end{array} $$
(8)
$$ \begin{array}{*{20}c} {\varvec{M}_{E} \left( {X |E_{j} } \right) = \left( {\begin{array}{*{20}c} 0 & 0 & 0 \\ q & 0 & 0 \\ {1 - q} & 1 & 1 \\ \end{array} } \right)} & {\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,{\text{Rejecting evidence }}\left( {\text{doubt}} \right)} \\ \end{array} $$
(9)

A supporting or belief evidence, increases the strength of belief that the operator is currently performing a task, whereas a rejecting or doubt evidence increases the strength of doubt in that hypothesis.

2.5 Evidence Processing Chain

For each evidence contained in the state space model, a processing chain is used to calculate its belief values and therefore evidential strength (Fig. 3).

Fig. 3.
figure 3

Processing chain for evidences.

The raw information about the test persons and their environment is given in an inhomogeneous and sub-symbolic form. In the first step (procedural sub-functions), symbolic hypotheses \( S \) (observations) are derived from sub-symbolic values \( B \). To be compatible with classical logic and Bayesian models, each observation is described by a state variable (see Sect. 2.4), which can be either \( true \), \( false \), or \( unknown \) if the sensor is broken or temporarily out of order. This value is then transformed into a belief triplet:

$$ \varvec{Q}_{S} \left( {S_{j} } \right) = \left\{ {\begin{array}{*{20}c} {\left( {1, 0, 0} \right)^{T} } & {if} & {S_{j} = true} \\ {\left( {0, 1, 0} \right)^{T} } & {if} & {\,S_{j} = false} \\ {\left( {0, 0, 1} \right)^{T} } & {if} & {\,\,\,\,\,\,\,\,S_{j} = unknown} \\ \end{array} } \right. $$
(10)

The reliability of this hypothesis is calculated as a matrix-vector multiplication of the sensor model with this belief triplet:

$$ \varvec{Q}_{S} \left( {E_{j} } \right) = \varvec{M}_{S} (E_{j} |S_{j} ) \varvec{Q}_{S} \left( {S_{j} } \right) $$
(11)

Then the strength of a single evidence for a given point in time is derived from a forward propagation of the belief values similar to [34, 35]. For this calculation, the evidence model is used:

$$ \varvec{Q}_{Ej} \left( X \right) = \varvec{M}_{E} (X |E_{j} ) \varvec{Q}_{S} \left( {E_{j} } \right) $$
(12)

In a time-dependent environment, it is not sufficient to describe evidences only for a given point in time. A dynamic model describes, how long an observed evidence is valid and is indispensable if short events like button presses are used as evidences. A general dynamic model can be defined as an initial value problem, i.e. a differential equation with initial conditions:

$$ \begin{array}{*{20}c} {\varvec{M}_{D} :} & {\frac{d}{dt}\varvec{Q}_{Dj} \left( {X,t} \right)} & = & {\varvec{f}\left( {\varvec{Q}_{Dj} \left( {X,t} \right)} \right)} \\ {} & {\varvec{Q}_{Dj} (X,0)} & = & {\varvec{Q}_{Ej} (X)} \\ \end{array} $$
(13)

We suggest a simple dynamic model, where the belief \( p \) and doubt \( q \) are decaying after the evidence has been observed. The rate of decay is assumed to be proportional to the current belief values:

$$ \begin{array}{*{20}c} {\frac{dp\left( t \right)}{dt} = \lambda p\left( t \right)} & \to & {dp = \lambda p\,dt} \\ {\frac{dq(t)}{dt} = \lambda q\left( t \right)} & \to & {dq = \lambda q\,dt} \\ \end{array} $$
(14)

Solving this differential equation leads to an exponential decay over time:

$$ \begin{array}{*{20}l} {p_{Dj} \left( {X,t} \right)} \hfill & = \hfill & {p_{Ej} \left( X \right)\,\,{ \exp }\left( { - \lambda t} \right)} \hfill \\ {q_{Dj} \left( {X,t} \right)} \hfill & = \hfill & {q_{Ej} \left( X \right)\,\,{ \exp }\left( { - \lambda t} \right)} \hfill \\ {r_{Dj} \left( {X,t} \right)} \hfill & = \hfill & {1 - p_{Dj} \left( {X,t} \right) - q_{Dj} \left( {X,t} \right)} \hfill \\ \end{array} $$
(15)

This model imitates the intuitive observation, that knowledge is getting lost over time and ignorance increases. The decay parameter \( \lambda \) of this model can be expressed as a half-life value, which indicates the time, after which the belief is half of the originally observed one:

$$ t_{{\frac{1}{2}}} = \frac{\ln \left( 2 \right)}{\lambda } $$
(16)

2.6 Combination of Evidences

After generating the evidences, in favor for or against the execution of a single task, all evidences are combined into a single resulting hypothesis. This is done by using a rule of combination, which is derived from Dempster’s rule of combination for a binary frame of discernment. With this rule, two evidences are combined by:

$$ \varvec{Q}_{1} \left( X \right) \oplus \varvec{Q}_{2} \left( X \right)\varvec{ }: = \varvec{Q}_{12} \left( X \right) = \left( {\begin{array}{*{20}c} {p_{12} \left( X \right)} \\ {q_{12} \left( X \right)} \\ {r_{12} \left( X \right)} \\ \end{array} } \right) $$
(17)
$$ \begin{aligned} p_{12} \left( X \right) = \frac{{p_{1} p_{2} + p_{1} r_{2} + r_{1} p_{2} }}{{1 - (p_{1} q_{2} + q_{1} p_{2} )}} \hfill \\ q_{12} \left( X \right) = \frac{{q_{1} q_{2} + q_{1} r_{2} + r_{1} q_{2} }}{{1 - (p_{1} q_{2} + q_{1} p_{2} )}} \hfill \\ r_{12} (X) = 1 - p_{12} (X) - q_{12} (X) \hfill \\ \end{aligned} $$

All evidences resulting from the processing chain are combined by using this rule of combination iteratively:

$$ \varvec{Q}_{total} \left( {X,t} \right) = \varvec{Q}_{D1} \left( {X,t} \right) \oplus \varvec{Q}_{D2} \left( {X,t} \right) \oplus \ldots \oplus \varvec{Q}_{Dm} \left( {X,t} \right) $$
(18)

The order of the evidences does not matter since the combination rule given above is both associative and commutative. Tasks, for which the resulting triplets \( \varvec{Q}_{total} \) have high belief and low doubt values, are considered as part of the current activity of a human operator.

3 Implementation in a Helicopter Mission Simulator

3.1 System Overview

The described method for activity determination is implemented in the helicopter mission simulator of the Institute of Flight Systems at the Bundeswehr University in Munich. It is part of a situation and workload-adaptive pilot associate system for MUM-T helicopter missions. Figure 4 shows an overview of the system for activity determination.

Fig. 4.
figure 4

System overview of the activity determination.

Firstly, observations are created during pilot monitoring, then the actual activity is inferred during evidential reasoning.

As hardware part of the sensors, different input devices are used. Input devices are the command and display units (CDUs), rotary encoders, flight controls, touch-sensitive multi-function displays (MFDs) [36], microphones in the headsets of the pilots and cameras for a gaze tracking system. Furthermore, states of the mission simulations are considered to capture the task context. For each type of these hardware sensors, we calculate observations consisting of symbolic detection values and quality measures to estimate sensor reliability in different software modules. These modules implement the low-level, procedural signal processing sub-functions of the processing chain described in Sect. 2.5. We explain the process of pilot monitoring for each different sensor type below (Sects. 3.23.8).

The inference algorithm is implemented in C++ computer code in the program PAD. The program PAD uses a separate thread for every pilot and does calculations in parallel. The observations from different sensors may have different signal propagation delays. Therefore, in the first step, those delays are corrected by using the actual times of the measurements. This is important to maintain causality and prevent reasoning problems, if later created signals arrive earlier at the activity determination. Latencies in the signals result from inter-process communications between the sensors and the PAD program. Then, the sensor model \( \varvec{M}_{S} (E_{j} |S_{j} ) \) for a scalar reliability value \( Z \) (7) is used to derive the observation triplet for every evidence \( \varvec{Q}\left( {E_{j} } \right). \) The processing chain and the rule of combination are implemented as described in Sects. 2.5 and 2.6. Figure 5 shows an excerpt from the resulting belief distributions.

Fig. 5.
figure 5

Example results for activity determination, screenshot of program PAD. Here, the pilot is flying a manual transit flight and communicating via the intercom at the same time.

The final step is to extract the actual activity of the pilots from those distributions. That means a classification decision must be made. Therefore, the belief triplet must be interpreted. We consider a task as part of the activity, if the belief is the greatest value:

$$ p > q + r $$
(19)

Since the belief triplet is normalized (2), this criterion is equal to

$$ p > 0.5 . $$
(20)

In practice, taking just the threshold value might result in oscillations and causality problems for very short events (e.g. button presses and state changes at the same time). Therefore, we add a last filtering step. This step is done by integrating the belief over a short time. To select the actual tasks of the activity, an impact value is calculated while the belief is greater than 0.5:

$$ I = \smallint_{while\,p\left( t \right) > 0.5} p\left( t \right) \,dt $$
(21)

If the impact of the evidence is high enough (in our simulator a threshold value of \( I = 0.4 \) is used), this task is classified as part of the current activity. The integration is stopped if the current belief drops below 0.5. The drawback of this last filter step is an additional time delay of the detection decision. The integration time depends on the strength of belief. The higher \( p \), the shorter the integration time and the faster the decision.

3.2 General Rule for Calculating Sensor Reliability for a Continuous Value

An observable is a measurable quantity in the simulator. Most of the sub-symbolic observables are continuous, time-dependent signals. If an observable is observed during the process of activity determination (i.e. a detector has been triggered), we call it observation. In the implementation, we are using the scalar reliability sensor model (7). Therefore, every observation is described as symbolic hypothesis \( S \) along with a scalar reliability or quality value \( Z \).

For generating a symbolic value from a sub-symbolic, continuous quantity \( x \) in the procedural sub-functions (Sect. 2.5), we are using a general threshold criterion to distinguish if a detector has been triggered or not:

$$ S\left( x \right) = \left\{ {\begin{array}{*{20}c} {true} & {if} & {x \ge x_{d} } \\ {false} & {if} & {x < x_{d} } \\ {unknown} & {if} & {sensor\, out\,of\,order} \\ \end{array} } \right. $$
(22)

The scalar reliability value is calculated as linear distance to the threshold \( x_{d} \):

$$ {\text{Z}}\left( x \right) = \left\{ {\begin{array}{*{20}c} {\frac{{x - x_{d} }}{{x_{max} - x_{d} }}} & {if} & {x \ge x_{d} } \\[1mm] {\frac{{x - x_{d} }}{{x_{min} - x_{d} }}} & {if} & {x < x_{d} } \\[1mm] 0 & {if} & {sensor\,out\,of\,order} \\ \end{array} } \right. $$
(23)

A perfect measurement corresponds to the reliability \( Z = 1 \) and a totally unreliable measurement corresponds to the reliability \( Z = 0 \). For practical purposes, the reliability value is bounded, i.e. values below zero are set to 0 and values above 1 are set to 1. Beside the discrimination threshold \( x_{d} \), this model requires a lower limit \( x_{min} \) and an upper limit \( x_{max} \) as parameters, which are defined below for every sensor type.

3.3 Button Press Detection

In the cockpit, there are many buttons the pilots can press. Examples are buttons of the CDUs or buttons on the grip of the flight controls (Fig. 6). Buttons can either represent system states (e.g. radio button) or short events, if they are pressed (e.g. line selection keys on the CDU).

Fig. 6.
figure 6

CDUs and flight controls with buttons [37].

Buttons in aviation must be extremely reliable. Even buttons in the simulator are highly reliable and we do not expect that they fail or deliver inaccurate results during a typical mission of less than two hours. Therefore, a button press is described by a perfect sensor model (4), which is equal to a scalar reliability model (7) with \( Z = 1 \):

$$ S = \left\{ {\begin{array}{*{20}c} {true} & {if} & {button\,pressed} \\ {false} & {if} & {button\,not\,pressed} \\ \end{array} } \right. $$
(24)
$$ Z = 1 $$
(25)

If the state of a button can take more than two discrete values, which can be the case for switches and rotary buttons, a binary model is generated for every possible value.

3.4 Movement Detection of the Flight Controls

For steering the simulated helicopter, a control load system of Reiser Simulation and Training GmbH has been integrated into the simulator (Fig. 7). The system consists of a cyclic stick, which controls the pitch and roll movement of the helicopter, pedals which control the yaw movement and a collective lever which controls the collective pitch of the rotor blades. The flight controls of both pilots are electrically coupled so that they are physically performing the same movements in parallel.

Fig. 7.
figure 7

Flight controls of the helicopter simulator: collective, cyclic and pedals.

The movement detection of the flight controls is based on measuring the rate \( r(t) \) (i.e. the time derivative) of the flight controls signal \( x(t) \) of each axis:

$$ r\left( t \right) = \left| {\frac{dx\left( t \right)}{dt}} \right| $$
(26)

If one of the rates for the different axes is greater than the threshold \( r_{d} \), the stick is considered as moved by the pilot:

$$ S\left( r \right) = \left\{ {\begin{array}{*{20}c} {true} & {if} & {r \ge r_{d} } \\ {false} & {if} & {r < r_{d} } \\ {unknown} & {if} & {sensor\,out\,of\,order} \\ \end{array} } \right. $$
(27)

For estimating the detection quality, the lower limit in Eq. (23) is no movement at all (\( x_{ \hbox{min} } = 0 \)). The upper limit \( x_{\text{max }} \) is determined during the calibration for a fast movement of the flight controls. The reliability value depends on the rate of movement:

$$ Z\left( r \right) = \left\{ {\begin{array}{*{20}c} {\begin{array}{*{20}c} {\frac{{r - r_{d} }}{{r_{max} - r_{d} }}} & {if} & {r \ge r_{d} } \\[2mm] {\frac{{(r_{d} - r)}}{{r_{d} }}} & {if} & {r < r_{d} } \\[1mm] 0 & {if} & {sensor\,out\;of\;order} \\ \end{array} } \\ \end{array} } \right. $$
(28)

3.5 Speech Detection

The goal of speech detection is to measure the auditory interactions of the pilots with the system and detect if they are speaking or not. There is currently no interpretation of the speech signal as such.

The raw data is the amplitude signal \( A(t) \) of a microphone, which is integrated in the head set of a pilot (see Fig. 8). Speech detection is based on calculating the power level \( P \) by averaging the squared amplitude over a short time interval \( \varDelta t \) (here a few hundred milliseconds).

Fig. 8.
figure 8

Integrated gaze tracking system Smart Eye Pro [38].

$$ P\left( t \right) = \frac{1}{\varDelta t}\int_{{t^{\prime} = t - \varDelta t}}^{t} {dt^{\prime} A^{2} (t^{\prime})} $$
(29)

For detecting speech, a power level \( P_{d} \) is used as the discrimination threshold \( x_{d} \) in Eq. (22):

$$ S\left( P \right) = \left\{ {\begin{array}{*{20}c} {true} & {if} & {P \ge P_{d} } \\ {false} & {if} & {P < P_{d} } \\ {unknown} & {if} & {sensor\,out\,of\,order} \\ \end{array} } \right. $$
(30)

Two distinct power levels can be identified. The first one is the signal level if the pilot is speaking \( P_{signal} \), the second one is the noise level \( P_{noise} \) if the pilot is not speaking, but the signal is still disturbed by noise from the environment. These parameters result from a calibration, where the power level is recorded for some seconds. The closer the current power level \( P \) is to the detection threshold, the worse is the reliability of the measurement:

$$ Z\left( P \right) = \left\{ {\begin{array}{*{20}c} {\begin{array}{*{20}c} {\frac{{P - P_{d} }}{{P_{signal} - P_{d} }}} & {if} & {P \ge P_{d} } \\[2mm] {\frac{{P - P_{d} }}{{P_{noise} - P_{d} }}} & {if} & {P < P_{d} } \\[1mm] 0 & {if} & {sensor\,out\,of\,order} \\ \end{array} } \\ \end{array} } \right. $$
(31)

3.6 Gaze Tracking

For the purpose of getting evidences from visual interactions with the cockpit displays, a commercial gaze tracking system (Smart Eye Pro, Smart Eye AB) has been implemented in the simulator [38]. It is a video-based system, which consists of four cameras for the left pilot and four cameras for the right pilot running at 60 Hz. The cameras are placed around the MFD (see left image of Fig. 8). The gaze tracking method is based on measuring the cornea reflection in the infrared spectrum. There is no intrusion of the pilots in their working domain. A 3D world model of the simulator displays and the outside view has been implemented in the system [38] (see right image of Fig. 8 in the lower left corner).

The raw sub-symbolic values of the eye tracker are pixel coordinates on a simulator display. For creating evidences, semantic information about the objects the pilots are looking at are necessary. These objects can be the airspeed indicator or tactical symbols in the tactical map and are provided by the MFDs [36]. The semantic information on which display, page and object the pilot is looking on is gained by combining the raw pixel coordinates with the layout on the cockpit displays (Fig. 9).

Fig. 9.
figure 9

Gaze tracking system for deriving object oriented semantic data.

In order to take account for measurement inaccuracies, the gaze on the screen is not only described by single pixel coordinates, but rather by a normalized Gaussian distribution over the 2D surface (see Fig. 10):

Fig. 10.
figure 10

Gaze tracking on PFD and during processing a check list.

$$ {f\left( {x,y} \right) = \frac{1}{{2 \pi \sigma_{x} \sigma_{y} \sqrt {1 - \rho^{2} } }}exp\left( { - \frac{1}{{2\left( {1 - \rho^{2} } \right)}}\left[ {\frac{{\left( {x - x_{0} } \right)^{2} }}{{\sigma_{x}^{2} }} + \frac{{\left( {y - y_{0} } \right)^{2} }}{{\sigma_{y}^{2} }}} \right.} \right.\left. {\left. { - \frac{{2 \rho \left( {x - x_{0} } \right)\left( {y - y_{0} } \right)}}{{\sigma_{x} \sigma_{y} }}} \right]} \right)} $$
(32)

The raw pixel coordinates of the current gaze coordinates delivered by the gaze tracker are given by \( x_{0} \) and \( y_{0} \). The widths of the gaze distribution in the x- and y- screen axis directions are expressed by two different standard deviations \( \sigma_{x} \) and \( \sigma_{y} \). The Pearson correlation coefficient \( \rho \) indicates the correlation between the two axes and therefore describes the shear deformation of the gaze spot. These parameters are not constant over the screen, but depend on the current gaze position \( x_{0} \) and \( y_{0} \):

$$ \begin{array}{*{20}c} {\sigma_{x} = \sigma_{x} (x_{0} ,y_{0} )} & {\sigma_{y} = \sigma_{y} \left( {x_{0} ,y_{0} } \right)} & {\rho = \rho (x_{0} ,y_{0} )} \\ \end{array} $$
(33)

We are using bilinear models to describe the dependence of the parameters on the screen position:

$$ \begin{array}{*{20}r} \hfill {\sigma_{x} \left( {x_{0} ,y_{0} } \right)} & \hfill = & \hfill {\beta_{x1} } & \hfill + & \hfill {\beta_{x2} x_{0} } & \hfill + & \hfill {\beta_{x3} y_{0} } & \hfill + & \hfill {\beta_{x4} x_{0} y_{0} } \\ \hfill {\sigma_{y} \left( {x_{0} ,y_{0} } \right)} & \hfill = & \hfill {\beta_{y1} } & \hfill + & \hfill {\beta_{y2} x_{0} } & \hfill + & \hfill {\beta_{y3} y_{0} } & \hfill + & \hfill {\beta_{y4} x_{0} y_{0} } \\ \hfill {\rho \left( {x_{0} ,y_{0} } \right)} & \hfill = & \hfill {\beta_{r1} } & \hfill + & \hfill {\beta_{r2} x_{0} } & \hfill + & \hfill {\beta_{r3} y_{0} } & \hfill + & \hfill {\beta_{r4} x_{0} y_{0} } \\ \end{array} $$
(34)

The constant parameters of these bilinear models are gained during the calibration process of the gaze tracking system individually for each pilot. During this calibration, multiple gaze points on the screen are sampled. Hereby, a least square problem is solved. This is done by a QR-decomposition contained in the Eigen C++ library [39].

Every object displayed on a screen is represented by a polygon of pixel coordinates (see Fig. 11). For determining a symbolic hypothesis \( S \), if the pilot is looking on this object or not, and a corresponding scalar reliability value \( Z \) (see Sect. 3.2), the integral of the gaze distribution over each single screen object is calculated.

Fig. 11.
figure 11

Example for estimating the reliability of a gaze tracking evidence. The left image shows an excerpt from the PFD and the right image the triangulated screen objects with the Gaussian gaze distribution on the horizontal situation indicator (green spot). (Color figure online)

$$ I = \iint_{Screen\,\,Object} {dx\,dy\,f\left( {x,y} \right)} $$
(35)

This integration is calculated technically, by decomposing the polygonal screen objects into triangles with a Delaunay triangulation algorithm [40] [41, pp. 1131–1141],and integrating over the individual triangles by using a Gauss-Legendre quadrature [41, pp. 179–200]. Figure 11 show an example for screen objects on the primary flight display (PFD).

With Eqs. (22) and (23) (\( x_{min} = 0 \), \( x_{max} = 1 \), \( x_{d} = 0.5, x = I \)), the symbolic value and reliability are given by:

$$ S\left( I \right) = \left\{ {\begin{array}{*{20}c} {true} & {if} & {I \ge 0.5} \\ {false} & {if} & {I < 0.5} \\ \end{array} } \right. $$
(36)
$$ Z\left( I \right) = \left\{ {\begin{array}{*{20}c} {2\left( {I - 0.5} \right)} & {if} & {I \ge 0.5} \\ { - 2\left( {I - 0.5} \right)} & {if} & {I < 0.5} \\ \end{array} } \right. $$
(37)

3.7 Touchscreen Input Detection

The multi-function displays (MFDs) in the simulator cockpit [36] are equipped with a touch-sensitive surface (Fig. 12). The pilot can interact with the MFDs by tapping on the surface with one or more fingers. Furthermore, the displays support pan, swipe and pinch gestures.

Fig. 12.
figure 12

Multi-touch multi-function Display (MFD) in the simulator cockpit.

Touchscreen inputs may be faulty since it is not easy to hit small buttons or symbols on the touchscreen with the finger. Therefore, an error model for touchscreen inputs is assumed. Similar to the error model for the gaze tracker, the touch point is not only a single point, but rather an axial symmetric Gaussian distribution:

$$ \begin{array}{*{20}c} {f\left( {x,y} \right)} & = & {\frac{1}{{2 \pi \sigma^{2} }} exp\left( { - \frac{{r(x,y)^{2} }}{{2\sigma^{2} }}} \right)} \\ {r(x,y)} & = & {\sqrt {\left( {x - x_{0} } \right)^{2} + \left( {y - y_{0} } \right)^{2} } } \\ \end{array} $$
(38)

The measured coordinates of the touch event are \( x_{0} \) and \( y_{0} \). The parameter \( \sigma \) defines the width of this distribution. This parameter is determined during calibration as the standard deviation of some touch samples. The calculation for deriving the symbolic value \( S \) and the reliability \( Z \) on the basis of this distribution is the same as for the gaze tracking system (see Sect. 3.6).

3.8 Situation Assessment

Not only the current observations from monitoring the pilots is important, but also the task context given by the situation. Therefore, the tactical situation and system states are also collected to feed the evidential reasoning algorithm. Beside threats from enemy forces, we are considering flight states and flight phases of the helicopter, states of the cockpit-displays, as well as states of the UAVs and their sensors. Weather, time of day and other environmental quantities are neglected in the current setup. Evidences, of the context are mainly used to create rejecting evidences (9). For simplicity, no real situation sensors are modeled at the moment but all observed quantities are taken as perfect reliable observations similar to button presses.

4 Current Status and Future Work

The method presented in Sect. 2 has been implemented in our helicopter mission simulator as described in Sect. 3. The developed software modules are part of a workload-adaptive associate system, which uses a common task model as central representation of task oriented knowledge [18]. To determine all the possible tasks, which can occur during a mission, a task analysis has been performed together with helicopter pilots and UAV operators of the German Armed Forces [42].

The task model contains currently 226 tasks in total and 85 of them are used for activity determination. The other tasks are abstract and used for hierarchical structuring the model and for other functions of the associate system like mission planning. Currently, there are about 1300 automatically generated observables. 440 of them are linked to the 85 tasks in the task model as evidences by a knowledge engineer. With the help of an inheritance mechanism between tasks in the task model about 730 evidences are generated in total. This results in an average of 8.6 evidences per task. Details about abstract tasks and inheritance relations are described in [18].

The total processing time of the algorithms running in the simulator takes between \( 40 \) and \( 80 \) ms on a state-of-the-art personal computer and therefore complies with the soft real-time requirement stated in Sect. 2.1.

The system is currently being hardened and the parameters are tuned. In the future, full mission simulations with aviators of the German Armed Forces are planned to evaluate this method for activity determination and the closed loop with activity determination as part of the embracing associate system.