1 Introduction

Roughly 50 years from the introduction of IBM 360, the way we interact with computers has changed immensely. Communicating with a computer required humans to learn a specific, limited set of commands that, when issued to a computer, produced a certain effect. Today, we witness a fast movement of computer technology towards more natural model of interaction with humans. This is attributed to rapid development of sensing technology and improvement of algorithms that can interpret the acquired signals. Sensing technology does not only serves for explicit interaction between the computers and our environment (such as in smart cities, houses, vehicles etc.), but it also opens a novel way of understanding humans as the technology is deployed to monitoring our behaviors and mental states. Ultimately, computers now act as a link between humans and their environment.

Vinton Cerf states that in a world of humanoid and functional robots, smart cities, smart dwellings, and smart vehicles we cannot disregard the notion of instrumented and augmented bodies [1]. By enabling computers to sense human neural states and behavior, we can also enable them to create dynamic user-state representations and respond dynamically and context-specifically to changes in actual human mental states (the user states). One way of achieving this is by expanding the conventional modelling in HCI, which is explicit in nature, by introduction of implicit interaction [2]. Implicit HCI assumes that actions performed by the user are not primarily aimed to interact with a computerized system, but the system may still understand the actions as an input [2].

A fertile ground for the introduction of implicit interaction can be found in an industrial workplace. Although It is generally known that industry tries to reach the “lights-out” manufacturing [3] (i.e. fully automated factories) for decades, there are still many industrial processes relying on human operators [4], which are often characterized as the most fallible element in the production line [5, 6]. The main cause of this are limited mental and physical endurance that can sometimes cause behavior and reactions to be unpredictable [6]. Our motivation is to develop an automated system capable of detecting a drop in mental and physical performance so that appropriate action (e.g. a break or a change in task) can be taken to prevent errors and improve the productivity and quality of manual tasks. In this study, we analyzed worker’s neural (electroencephalography- EEG) and behavioral (reaction times - RT and the quantity of the task unrelated movements) signals in order to interpret the implicit multimodal interaction [7] between the worker and the workplace in manual assembly tasks. The ultimate goal is to achieve a system that will be able to perform online detection of mental strain and monitor attention fluctuation thereby preventing the occurrence of operating errors [8] and improving the worker experience.

2 State-of-the-Art

We approach the problem of worker’s online attention monitoring by analyzing the relationship between brain dynamics and the active behavior during execution of work activities [9]. This is done by recording brain activity, using unobtrusive wearable EEG in parallel with motion capture sensors in naturalistic industrial environment. Although industry conceived the use of wearables for over a decade now [10], the majority of their applications are still oriented towards explicit interaction and providing workers with the information about their task [11], or for augmenting the reality [12], rather than for collecting and exploiting data about the task being performed or the worker performing the task.

The only available and reliable tool for direct brain activity monitoring in a naturalistic workplace environment is wearable EEG [13]. Nowadays, the EEG research is mainly oriented towards Brain Computer Interface (BCI), which uses brain activity to allow humans to interact with computers without any physical contact or verbal exchange of commands [14]. BCI research already had some success in medical applications, mainly in helping people reacquire the lost ability of moving a certain body part. Moving away from its primary usage, however, a novel direction in BCI (passive BCI) is orienting towards continuous analysis of the recorded brain signals in human-machine interaction, with the aim to objectively assess the user states [15].

A clear momentum of passive BCI technology [15] recently opened new doors to application in industry, empowering the research area of neuroergonomics [16]. This emerging scientific field is focused on merging classical ergonomics methods with neuroscience, while exploiting the benefits of both [16]. Mainly, it provides precise analytical parameters of the work efficiency of individuals, by investigating the relationship between neural and behavioral activity [17]. The advantage of this approach is avoidance of unreliable results about the cognitive state of the workers based only on theoretical constructs [17]. As EEG sensors became wearable, it is finally possible to reach the ultimate goal of neuroergonomics and examine how the brain carries out complex tasks in real working environments [16]. Specific EEG features that can be used for estimating human attention level and cognitive engagement are event-related potentials (ERPs) and Engagement index (EI), respectively.

ERPs represent the voltage fluctuations of the EEG signal that are related to the specific event (stimuli) [18], and its components are defined by the polarity and latency from the event occurrence. As such, the P300 ERP component represents the positive deflection that occurs approximately 300 ms upon the stimulus presentation. The P300 component is the most prominent over the central and parieto-central scalp locations (Fz, Cz, CPz and Pz, the central portion of Fig. 2; [18]). It has been largely accepted that the magnitude of the P300 component’s peak directly correlate to the attention level of the person - higher amplitude values of the P300 correspond to the higher attentive state [18].

The cognitive engagement of a person can be measured from the EEG signal through EI. The brain rhythms are usually investigated through four distinct frequency bands: gamma (γ = 0–4 Hz), theta (θ = 4–8 Hz), alpha (α = 8–12 Hz) and beta (β = 12–30). The low frequency waves are usually high in amplitude and are dominant in the state of rest, relaxation, sleepiness, low alertness etc. Conversely, the high frequency and low amplitude waves reflect the alert state, state of wakefulness, state of task engagement, etc. EI represents the ratio between the high frequency waves (β), and the summation of the low frequency waves (α + θ), i.e. EI = β/(α + θ). Therefore, a higher EI indicates the higher engagement of the person in the task, whereas the low values of EI indicate that person is not actively engaged with some aspect of the environment during the task [19].

Apart from direct observation of brain functions with the neuroimaging techniques, the user state assessment can be conducted with behavioral measurements. For instance, in early stages of experimental psychology, the researchers relied mostly on behavioral measurement (e.g. reaction times − RTs) to estimate the cognitive state. RTs reflect various cognitive processes and are recorded simply by measuring the time elapsed from stimulus presentation to initiation of the required action. Although RTs reflect cognitive processes to an extent, they lack the temporal precision and fail to provide deeper insights to the underlying brain activity [20].

Another measureable aspect of human behavior is body movements [9]. Research has indicated that variability in movement that is not directly related to the task could be an important indicator of the user’s state assessment [21]. Behavioral activity analyses of movements are usually carried out off-line, since researchers typically record the participants with the RGB camera and then perform manual analysis, which mostly comes down to counting the number of different types of movements [21]. Advances in HCI and computer vision technology allowed an on-line and automated processing of these movements. The structured light technology in unison with additional sensors, as can be found in Microsoft KinectTM, opens the possibility of automatic acquisition of the information on behavioral activities. The KinectTM interprets human body with the stick figure, where the joints (e.g. elbow, shoulder, etc.) are represented in terms of key-points and they can be retrieved in real-time. This enables installment of simple behavioral models based on movement energy (ME) that we propose and which will be described further in text.

The combination of neural and behavioral modalities can open a deeper understanding of human mental states during complex work activities [9]. Until very recently, research that investigated the relationship between brain dynamics and human behavior was confined to strictly controlled laboratory conditions, where the obtrusiveness and immobility of EEG and motion sensors was the main culprit for this. However, as the technology matures, EEG eventually became wearable, thus enabling experiments in the realistic workplace conditions. In order to investigate the possibility of implicit interaction between worker and workplace, we developed the replicated workplace that was equipped with the computing entities capable of sensing workers’ neural states and interpreting behavioral activities. We named such a workplace “sensitive workplace”.

3 Methods

3.1 Participants

Six participants were engaged in the study. All participants had normal or corrected-to-normal vision. They have agreed to participation and signed informed consent after reading the experiment summary. The study was approved by the Ethical committee of the University of Kragujevac.

Participants started with a 15-minute training session, after which they confirmed their readiness to participate in the study. The experiment consisted of two tasks, where each task’s duration was around 90 min, and between the tasks they had a 15-minute break (the total duration of the experiment was 4 h). The tasks were counterbalanced across the participants.

3.2 Replicated Workplace

We replicated the physical workplace of an automotive sub-component manufacturing company where we simulated the assembly of the hoses used in the hydraulic brake systems of the vehicles (see Fig. 1).

Fig. 1.
figure 1

Real life workplace (on the left) compared to the replicated laboratory workplace (on the right)

The operation was divided into six sub-steps as follows: (1) – Picking the rubber hose (blue box, on the right hand of participant in the study); (2) – picking the metal extension, that should be crimped to the hose (yellow box, on the left hand side of the participant); (3) – placing metal extension on the rubber hose; (4) – Placing unassembled part in the improvised machine (white box in front of participant); (5) – pressing the pedal foot switch, with the right foot, in order to initiate the simulated crimping process; (6) – removing the assembled part from the machine and placing it inside the box with the assembled parts (grey box in front of participant).

3.3 Sensitive Workplace Architecture

A combination of sensing technologies was installed to the replicated workplace in order to acquire neural and behavioral data. Figure 2 depicts the system architecture of the resulting sensitive workplace environment.

Fig. 2.
figure 2

Sensitive workplace system architecture

For neural data, we opted for wireless EEG signal acquisition, using SMARTING system (mBrainTrain, Serbia). SMARTING is a small and lightweight EEG amplifier, tightly connected to the EEG recording cap (EasyCap, Germany), thus minimizing movement related artefacts making it usable in real-life environments (Fig. 2 – upper left corner).

The movement data were acquired with the Microsoft KinectTM that was mounted above and in front of a person. The human body is tracked based on structured light technology and is interpreted in a form of a stick figure. Since the device comes with an software development kit (SDK), we were able to develop standalone motion-acquisition module capable of simultaneously recording and streaming the data (Fig. 2).

When acquiring the neural and behavioral signal modalities in a real-world environments, a precise synchronization between multiple sensor modalities represents a major challenge. This is even more challenging when requiring synchronization of the sensors that are different in both the type of data and the sampling rate, e.g. EEG, RTs and movement data. For example, during acquisition of EEG signals, and particularly for extraction of ERPs, a millisecond precision in the data synchronization is mandatory. This problem becomes prominent with wireless technologies, where grouping sensors or making a common reference signal is not feasible.

To deal with this issue, we used the open source platform “Lab Streaming Layer” (LSL, https://github.com/sccn/labstreaminglayer). LSL is a real-time data collection and distribution system, capable of synchronously streaming multiple streams of multi-channel data which are heterogeneous in both type and sampling rate [9, 22], to the recording program “Lab Recorder” (bottom central panel, Fig. 2). LSL has a built-in synchronized time facility for all recorded data and it is capable of achieving sub-millisecond accuracy on computers connected in a local area network (LAN) [22].

In order to elicit ERPs from continuous EEG recording we provided visual stimulation to the subjects (Explained in detail in the Sect. 3.4). For this we used Simulation and Neuroscience Application Platform (SNAP, available at https://github.com/sccn/SNAP) capable of real-time experimental control, and compatible with the LSL. SNAP also supports interpretation of actions retrieved from various input devices.

3.4 Experimental Task

We conducted an experimental study using the sensitive workplace, with the aim of investigating the relationship between EEG and behavioral modalities. One goal of the experiment was to determine whether RTs and ME (in combination with EEG) could provide reliable attention monitoring results. We subjected participants to the change in task during the simulated assembly task, in order to investigate how the changes in mental workload alter workers’ attention level. The ultimate goal is to propose a real-time system for the on-line measurement of workers’ attention in industrial environments.

Participants in this study sat in a chair in front of the improvised machine (shown in the right panel of Fig. 1), while performing the simulated assembly task. In order to investigate the time-locked features of neural signals (ERPs), two verified psychological tests for estimating cognitive ability were presented to participants on the 24” screen from a distance of approximately 100 cm (the task specifications were programmed in SNAP). The tests we used were the modified Sustained Attention to Response Task (SART) and the Arrow task.

The SART paradigm represents the ‘go/no-go’ task. The numbers ranging from ‘1’ to ‘9’ are presented to participants in random order, where they are required to initiate the action, with the exception if the number ‘3’ appears on the screen. Therefore, numbers other than ‘3’ are target stimuli and the probability of the appearance of the target stimuli was set to 90%. The Arrow task is also a ‘go/no-go’ task, where participants are required to initiate the action once the white arrow appears on the screen (also a target stimulus, with 90% probability of appearance), whereas they should withhold the action if the red arrow appears.

The main difference between the SART and Arrow task was in the level of mental workload to which participants were subjected. SART is monotonous psychological test, being suitable for investigating the neural correlates of the attention decline. In this task, participants could freely choose which hand they will initiate the action with. On the other hand, in the Arrow task we imposed a slightly higher workload to participants, as in this task they were instructed to initiate the action alternating the hand according to the direction of the white arrow presented on the screen. Thus, the direction of white arrow determined the order of action execution.

3.5 Sensing the Operators’ State

In order to estimate the user state through EEG signals, we extracted and analyzed specific features of ERPs and Engagement index (EI). The behavioral modalities of RTs and ME of the participants were analyzed together, during the periods when they were not physically engaged with the task. Finally, we investigated the relationship between attention- and cognitive engagement -related behavioral and neural modalities. Methodology outline is presented on Fig. 3 and explained further in text.

Fig. 3.
figure 3

Methodology outline: (a) Engagement Index equation; (b) Visualisation of P300 window; (c) Motion Energy equation. Central Segment presents joint positions (also called key-points) for motion analysis (Left, Right, Central/Palm, Wrist, Elbow, Shoulder, Head) and position of used EEG electrodes (Fz, Cz, CPz and Pz)

P300 and EI - Attention Related EEG Modalities

In order to calculate the P300 component’s amplitude, the EEG signal was first bandpass filtered from 1–35 Hz, then re-referenced to the channels on mastoid locations, followed by the eye movement and muscle artifacts removal using Independent Component Analysis (ICA; [22]). Finally, the signal was segmented to the period of -200 to 800 ms, according to the timestamps of stimulus presentation. We used the mean peak amplitude measure, meaning that we calculated the P300 peak amplitude as the mean value of the window in the range between 230–450 ms following stimulus onset (shaded section in the upper right corner on Fig. 3).

We further analyzed the Engagement Index (EI; [19]). In order to quantify the power contained in different signal bands, bandpass filtering was applied in three frequency bands (θ, α and β), followed by re-referencing the signal, and artifact removal with ICA [23]. The EEG signal was then segmented according to the timestamps of the stimuli appearance and the signal segments of 1 s preceding the stimulus appearance was used for the further analysis. Finally, the signal Power Spectral Densities (PSDs) were calculated for each frequency band and then EI is calculated according to the equation (\( EI = \beta /(\alpha + \theta ) \)), that is graphically represented in the upper left corner of the Fig. 3.

Reaction Times and Motion Energy – Attention Related Behavioral Modalities

Reaction Times are recognized as a tool for estimating the level of attention, where the shorter RTs are often considered as an indicator of higher attentive state, with the exception in case of speed-accuracy trade-off. We calculated RTs as an elapsed time between the stimulus presentation and the beginning of the machine crimping action (i.e. between steps 1 and step 5 that were explained in Sect. 3.2).

The quantity of task unrelated movements is another behavioral modality analyzed. We measured the amount of movement, from the period where participants assembled a part (step 6 from Sect. 3.2) until the successive ‘go’ stimulus to perform the task (step 1 from Sect. 3.2). In this period, participants were expected to sit still with no activity. To quantifying these movements, we analyzed the data obtained from key-points provided by the KinectTM sensor. In this analysis, a seated model of the person was used (joints indicated in central portion of the Fig. 3), since the machine occludes the lower portion of the body. We calculated the kinetic energy of movement [24] for each point in three axes (Eq. 1) and the final energy for each key point was calculated as a summation of the energies produced for each axis (Eq. 2).

$$ \frac{\partial Ex}{\partial t} = \partial x\partial^{2} x; \frac{\partial Ey}{\partial t} = \partial y\partial^{2} y; \frac{\partial Ez}{\partial t} = \partial z\partial^{2} z $$
(1)
$$ {\text{ME}} = \mathop \sum \limits_{i = 1}^{n} \left\{ {\begin{array}{*{20}c} {\begin{array}{*{20}c} {\left[ {\left( {x_{i + 1} - x_{i} } \right)\left( {x_{i + 2} - 2x_{i + 1} + x_{i} } \right)} \right]} \\ + \\ {[(y_{i + 1} - y_{i} )(y_{i + 2} - 2y_{i + 1} + y_{i} )]} \\ \end{array} } \\ + \\ {[(z_{i + 1} - z_{i} )(z_{i + 2} - 2z_{i + 1} + z_{i} )]} \\ \end{array} } \right\} $$
(2)

3.6 Statistical Analysis

We conducted an off-line data analysis in order to investigate the relationship between neural and behavioral attention related modalities. First, we conducted Spearman’s correlation, mainly to investigate whether any of the four attention-related modalities can reveal a decline in attention and cognitive engagement as the task progresses, i.e. with the Spearman correlation we investigated the general trend of each modality over the time course of the task. Further, we performed Pearson’s correlation between all the modalities recorded in the study with the aim of comparing RTs and ME to the EEG data.

4 Results and Discussion

The results of the Spearman correlation are shown in the upper panel of Fig. 4 (note that the “+”/“−” sign represents positive/negative correlation (trend), “−” with the p < .05, while the empty field represent statistically insignificant values). The results revealed that in the monotonous (SART) task the behavioral activity of ME is increasing, while the P300 amplitude and EI are decreasing over the experiment progression, regardless of the order of task presentation. The Spearman correlation further revealed that in the more demanding (Arrows) task, the results depended on the order of the Arrows task presentation, that is, the results were identical to the SART task, if the Arrows were presented as a first task. However, when the arrows task was presented as second task, the P300 amplitude increased as the task progressed, while the ME and EI decreased during the task. It is noteworthy that RTs were independent from both task type and task order and it was decreasing with the time-on-task, probably caused by the effect of rehearsing as the task progresses.

Fig. 4.
figure 4

Results retrieved from experimental study. Upper left table – Spearman’s correlations of elapsed task time with neural and behavioral factors; Bottom table – Pearson’s correlations between behavioral and neural factors; The fields with “+”/“−” sign represent positive/negative correlation results (p < .05), while the empty fields are representing the statistically non-significant results (p > .05). Fz, Cz, CPz and Pz represent the electrode sites from which we calculated P300 amplitudes and EI. The rows in the lower table represents the key point locations derived from Kinect, explained on Fig. 3. The last rows represent the reaction times (RTs).

From Spearman correlation results (presented in lower panel of Fig. 4), it can be inferred that the monotonous task (SART) induces the attention decline, regardless of the task order. Spearman correlation revealed that the P300 amplitude and EI declined as the time of the task increased, while ME increases as the task progresses. On the other hand, results in the more mentally demanding task (Arrows) depended on the presentation order. This is especially notable through evaluation of the P300 amplitude, as it increases during the task if the Arrow task follows SART. Although EI was still decreasing, proving that mental engagement of the participants was decreasing during the task, the evaluation of the P300 amplitude revealed that the participants were able to maintain higher attention state during the task. This is also notable through evaluation of ME, as only in case where Arrows were a second task, the ME was decreasing with time elapsed, i.e. the participants were making less task unrelated movements.

Bottom part of Fig. 4 depicts Pearson’s correlation results. It is notable that expected negative correlation between P300 amplitudes and ME is more distinguished in case of low demanding, monotonous (SART), than it can be seen in more mentally demanding (Arrow) task. This finding is not surprising, as in the existing literature the quantity of movements, which are not related to the task, are reported to be linked to the attention decline in monotonous tasks [21]. Further, when the more monotonous task is presented first, the EI was negatively correlated for each key-point, while in the more demanding task almost no correlations were found between neural and behavioral attention-related modalities. Finally, if the Arrows were presented as the first task, the only negative correlation with the P300 amplitude was at the LP, LW, RP and RW key-points, while the EI was positively correlated with the ME on almost all key-points. This could be explained through the notion of re-activation, as participants in more mentally demanding task use the task unrelated movements in order to re-activate the attention related resources in the brain [18], thus staying more focused on the task. This was not obvious if the SART was following the Arrow task. In fact, again in the more monotonous task the P300 amplitude was negatively correlated to the ME on majority of key-points. From all these results, we can infer that during low demanding and monotonous tasks, the ME that is unrelated to the task is negatively correlated with the attention level.

Presented results support our intention on monitoring the operators’ attention level by synchronously recording and analyzing behavior and EEG modalities, with relatively simple and low-cost unobtrusive sensor network. However, an obvious limitation is that we did not use the on-line attention analysis, which is expected to occur in future studies. The future steps will include the development of advanced algorithms for automated, real-time acquisition and analysis of presented modalities, which we could further implement in factory environment for sensing the user state. Such a system could ultimately lead to increase of overall workers’ wellbeing.

5 Conclusion

Monotonous and repetitive tasks, commonly seen in manual assembly production lines, often lead to mental strain, due to limited mental and physical endurance of humans. Our work focused on exploiting advances in neural and behavioral sensing technology in order to detect users’ states that indicate occurrence of attention decline and mental fatigue. The final goal is to prevent errors that might lead to product waste or injuries and which are caused by attention decline and mental fatigue.

We have shown that neural and behavioral markers can provide more detailed insight in human attention level. This was done in a realistic workplace environment and represents a first step of the described HCI model paradigm. ME, which can be analyzed in real time, is less obtrusive than EEG. It may provide a reliable, stand-alone tool for attention monitoring, especially in industrial scenario. An obvious follow-up is to provide real-time processing of these features and put them in a feedback loop with some sort of indication communicated to workers. That way, a person is informed about the attention drops in a close-to real-time manner, which could serve to prevent errors and dangerous consequences. This could then become basis of a true future implicit human-computer interaction.