Sensitive, Diagnostic and Multifaceted Mental Workload Classifier (PHYSIOPRINT)

Popovic, Djordje; Stikic, Maja; Rosenthal, Theodore; Klyde, David; Schnell, Thomas

doi:10.1007/978-3-319-20816-9_11

Sensitive, Diagnostic and Multifaceted Mental Workload Classifier (PHYSIOPRINT)

Djordje Popovic⁶,
Maja Stikic⁶,
Theodore Rosenthal⁷,
David Klyde⁷ &
…
Thomas Schnell⁸

Conference paper

3740 Accesses
8 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9183))

Abstract

Mental workload is difficult to quantify because it results from an interplay of the objective task load, ambient and internal distractions, capacity of mental resources, and strategy of their utilization. Furthermore, different types of mental resources are mobilized to a different degree in different tasks even if their perceived difficulty is the same. Thus, an ideal mental workload measure needs to quantify the degree of utilization of different mental resources in addition to providing a single global workload measure. Here we present a novel assessment tool (called PHYSIOPRINT) that derives workload measures in real time from multiple physiological signals (EEG, ECG, EOG, EMG). PHYSIOPRINT is modeled after the theoretical IMPRINT workload model developed by the US Army that recognizes seven different workload types: auditory, visual, cognitive, speech, tactile, fine motor and gross motor workload. Preliminary investigation on 25 healthy volunteers proved feasibility of the concept and defined the high level system architecture. The classifier was trained on the EEG and ECG data acquired during tasks chosen to represent the key anchors on the respective seven workload scales. The trained model was then validated on realistic driving simulator. The classification accuracy was 88.7 % for speech, 86.6 % for fine motor, 89.3 % for gross motor, 75.8 % for auditory, 76.7 % for visual, and 72.5 % for cognitive workload. By August of 2015, an extended validation of the model will be completed on over 100 volunteers in realistically simulated environments (driving and flight simulator), as well as in a real military-relevant environment (fully instrumented HMMWV).

You have full access to this open access chapter, Download conference paper PDF

1 Introduction

The construct of mental workload – defined as the degree to which mental resources are consumed by the task at hand - is difficult to quantify. This is due largely to the nature of the construct, a latent or hidden variable that results from an interplay of several other variables such as the objective task load, external distractions (task-irrelevant stimuli that draw one’s attention and temporarily occupy mental resources), internal distractions (e.g. task-related stress, task-irrelevant mentation), capacity of one’s mental resources and strategy of their utilization (Fig. 1). The overall capacity of mental resources and strategy of allocating them to the tasks are, in turn, strongly dependent on individual traits (e.g. personality profile, stress resiliency), previous training and factors such as motivation, fatigue and stress. Furthermore, for any given individual, different types of mental resources such as attention, audio-visual perception, cognition or motor control will be mobilized to a different degree in different tasks even though the subjective perception of their ‘difficulty’ may be the same [1, 2]. An ideal measure of mental workload therefore needs to be multifaceted and diagnostic, such that is had the ability to quantify the engagement levels of each of mental resources before eventually combining them into a single global measure. Moreover, it should be able to model the impact of individual traits and psychophysiological states onto the capacity and utilization of mental resources to a degree that does not hamper its ease of use.

The standard techniques for workload assessment include self-report scales, performance-based metrics, and physiological measures. Self-report scales are popular due to their low cost and consistency (assuming that the individual is cooperative and capable of introspection). Some of these scales are one-dimensional such as the Rating Scale of Mental Effort (RMSE) and the Modified Cooper-Harper scale (MHC) [3], whereas some scales comprise subscales that measure specific mental resources, e.g., NASA Task Load Index (TLX) [6], Subjective Workload Assessment Technique (SWAT) [5], and Visual Auditory Cognitive Psychomotor method (VACP) [4]. The major drawback of these measures is that they cannot be unobtrusively administered during the task, but are assessed retrospectively at its conclusion. Furthermore, the inherent subjectivity of self-ratings makes across-subjects comparisons difficult. Self-report scales are, therefore, often complemented with performance measures, such as reaction time to different events or accuracy of responses. The performance assessment is relatively unobtrusive and can be accomplished in real time at low cost, but it is not sensitive enough because of the complex relationship between the two variables [7, 8]. Moreover, performance measures cannot tap into all cognitive resources with comparable accuracy. Lately, there has been renewed interest in physiological measures as workload assessment metrics, and signals [9–16] such as electro-oculography (EOG), electromyography (EMG), pupil diameter, electrocardiography, respiration, electroencephalography and skin conductance. Until recently, their utility was limited by the obtrusive nature of earlier instrumentation, but this has changed with the advent of miniaturized sensors and embedded platforms capable of supporting complex signal processing techniques. Still, physiological workload measures have multiple drawbacks. First, the physiological workload scales are often derived empirically on a set of tasks assumed to represent different workload levels and selected ad hoc, without detailed consideration of their ecological validity and ability to tap into different mental resources (e.g., cognitive, visual, auditory, or motor workload). As a result, the models trained on such atomic tasks may not perform well when applied to the physiological signals acquired during other non-atomic tasks even though they seemingly require the same mental resources. Second, in spite of the well known fact of considerable between- and within-subject variability of nearly all physiological signals and metrics, the majority of physiological workload models have been developed and validated on a relatively small sample of subjects. Third, the classifiers used in the models introduced hitherto have typically lacked mechanisms for an adjustment of the model’s parameters in relation to individual traits, which leads to models that do not generalize well. Finally, the models have mostly ignored the considerable amount of noise inherent in the acquired physiological signals. Thus, poor performance of some models could be attributed to their reliance on rather simple mathematical apparatus.

This paper introduces PHYSIOPRINT - a workload assessment tool based on physiological measures that is built around an established theoretical model called Improved Performance Research Integration Tool (IMPRINT) [17]. The proposed model distinguishes among seven different workload types, and is trained on tasks chosen to represent the key anchors on the respective workload scales. Its mathematical apparatus is not computationally expensive, so it is applicable in real time on a fine timescale.

The rest of the paper is organized as follows. In Sect. 2 we outline the experimental setting while Sect. 3 reports on the experimental results. Finally, in Sect. 4, we summarize our results and give an outlook on future work.

2 Methods

2.1 IMPRINT Workload Model

The IMPRINT Workload Model, developed by the Army Research Laboratory (ARL) [17], discriminates between seven types of workload: visual, auditory, cognitive, fine motor, gross motor, speech, and tactile. Each workload type is quantified on an ordinal/interval scale, similar to the VACP scales [4]. Each of the seven scales is defined by a set of behaviors of increasing complexity that are associated with a numeric value between 0 and 7. Furthermore, for each point in time, IMPRINT produces a composite measure of the overall workload, which is defined as a weighted sum of the type-specific workload values calculated across all tasks that are being simultaneously performed. The model has been successfully applied to estimate mental workload in a number of settings of military relevance, including a strike fighter jet, a mounted combat system [18], and the Abrams tank [19].

2.2 Study Design

Twenty-two healthy subjects (11 females, 25 ± 3 years) who had reported no significant previous or existing health problems participated in the study. They were required to maintain a sleep diary for 5 days prior, and refrain from alcoholic and caffeinated beverages 24 h prior to the experiment. The experiment would typically start at 9AM, when the attending technician would set up the subject with the sensors and recording equipment (Figs. 2 and 3). The wireless X24 sensor headset (Advanced Brain Monitoring Inc., Carlsbad, CA, USA) was used to acquire 20 channels of electroencephalography (EEG) along with electrocardiography (ECG), respiration and head movement data, while a smaller X4 device from the same manufacturer recorded the forearm electromyography (EMG). Following the setup, the subject would engage in a series of computer-based auditory, visual, cognitive and memory tasks that corresponded to the key anchors of the respective workload scales from the IMPRINT model (atom tasks, Table 1). The subject would next perform a set of physical exercises on a treadmill (3 min of walking at 2 mph at 0° inclination, 3 min of running at 6 mph at 0° inclination, 3 min of walking at 2 mph at 15° inclination, 3 min of walking at 6 mph at 15° inclination) and with weights (lift-ups with 5–10 lb in each hand). The subject would then participate in a 30 min session in a driving simulator, and, finally, repeat the computer-based atom tasks. The entire session was recorded with a microphone and video camera that were mounted on the PC or treadmill displays in front of the participant. The protocol was approved by a local Institutional Review Board; all subjects signed an informed consent before the experiment began, and were financially compensated for their participation in the study.

Table 1. Low workload PHYSIOPRINT tasks

Full size table

2.3 Data Processing and Analyses

All computerized tasks, physical exercises and driving scenarios were scored on a second-by-second basis with respect to the workload they impose in accord with the IMPRINT workload model [17, 18]. Each EEG channel was processed with proprietary algorithms to eliminate artifacts and derive spectral features for each subsequent 2-s data segment with 1 s (50 %) overlap. ECG signals was filtered, QRS complexes were detected, and beat-to-beat heart rate (HR) were converted into second-by-second values. Time- and frequency-domain measures of heart-rate variability (HRV) were derived from the HR data in accord with the literature [20]. EOG signals were processed with our proprietary algorithms for detection for eye blinks and eye fixations. EMG levels and body and limb motion were quantified in each second of the data using the bin integration. In addition to these ‘absolute’ or primary variables, a number of secondary or ‘relative’ variables were derived by computing ratios and/or differences between different time instances of the same primary variable or between different but functionally or spatially related primary variables (e.g. anterior-posterior gradient of the alpha EEG power). Finally, brain-state variables quantifying fatigue, alertness and distraction were derived using our validated classifiers [15, 16]. Step-wise regression analysis was used to identify variables derived from the physiological signals that are most predictive of the IMPRINT workload profiles and performance. The analyses took into account the existing relationship between specific workload types and certain physiological signals (e.g. speech workload scale and respiration, gross motor workload and heart rate or body/limb motion).

3 Results

3.1 Speech Workload Scale

The impedance-based respiration signal and sound envelope from our X12 device sufficed for a very precise identification of speech episodes across the pertinent tasks (A2, A4, C3, C4, S1 and S2). Between-subject variability was not significant, and overall classification accuracy amounted to 88.7 % (Table 2).

Table 2. Classification of speech events

Full size table

3.2 Fine Motor Workload Scale

The EMG acquired from the forearm was a good source for identification of fine motor activities in the pertinent tasks (B4, B5, C1, C2, A1, F1). Between-subject variability was relatively large, and normalization with respect to the baseline EMG activity (defined as the EMG activity during tasks B1 and B2) was required for obtaining the classification accuracy of 86.6 % (Table 3). As one can observe, the sensitivity was high for no activity and short discrete activities (on/off EMG pattern), but there was more confusion between the continuous activities (steering wheel adjustments vs. contour tracking).

Table 3. Classification of fine motor events

Full size table

3.3 Gross Motor Workload Scale

The X-, Y-, and Z-axis signals from the accelerometer within our head-worn EEG recorder and arm-worn peripheral recorder proved to be an excellent source for differentiation of gross motor activities (push-ups and treadmill exercises). Between-subject variability was not significant, and the classification accuracy reached 89.3 % (Table 4).

Table 4. Classification of gross motor events

Full size table

3.4 Auditory Workload Scale

The classifier attempted to distinguish among 5 conditions: ‘no activity’ (silent breaks during tasks B1–B3), register a sound (beeps delivered throughout tasks B1–B5), ‘discriminate sounds’ (uni- vs. bilateral beeps in tasks A1 and A4), ‘interpret speech’ (digits read during tasks C2 and C4), and interpret sound patterns (different honking patterns during the driving task). The overall classification accuracy (shown for a classifier developed on combination of subsets of feature vectors from both times of the day) amounted to 75.8 % (Table 5).

Table 5. Classification of auditory events

Full size table

3.5 Visual Workload Scale

The classifier attempted to distinguish among 5 conditions: ‘no activity’ (silent breaks during tasks B1–B3), register an image (tasks B4, B5), ‘detect a difference’ (task V4), ‘read a symbol’ (digits read during tasks C1 and C3), and scan/search (task V5). The overall classification accuracy (shown again for a classifier developed on combination of subsets of feature vectors from both times of the day) amounted to 76.7 % (Table 6).

Table 6. Classification of visual events

Full size table

3.6 Cognitive Workload Scale

The classifier attempted to distinguish among four (4) conditions: ‘no activity’ (silent breaks during tasks B1–B3), alternative selection (task A2, A4), ‘encoding/recall’ (tasks C1–C4), and calculation (task C5 and sign task during the driving). The overall classification accuracy (shown again for a classifier developed on combination of subsets of feature vectors from both times of the day) amounted to 72.5 % (Table 7).

Table 7. Classification of gross motor events

Full size table

4 Discussion

The current study sought to develop a physiologically-based method for workload assessment applicable in the challenging automotive setting. We addressed this need by designing a comprehensive, sensitive, and multifaceted workload assessment tool that incorporates the already established theoretical workload framework that both: (1) covers the different types of workload employed in complex tasks such as driving, and (2) helps define the necessary atomic tasks for building the model. The experimental results suggested that the classifier benefits from combination of complementary input signals (EEG and ECG), better coverage of the scalp regions by an increased number of EEG channels, inclusion of concurrent physiological measurement of fatigue and alertness levels, and short-term signal history. We aimed to overcome the individual variability inherent in the physiological data by including the relative PSD variables in the feature vector. The generalization capability of the trained model was tested by using leave-one-subject-out cross-validation. The proposed method demonstrated that physiological monitoring holds great promise for real time assessment of mental workload.

In the future, we plan to extend the model validation to other simulated environments (flying simulator at Systems Technology Inc.) and real pertinent environments (fully instrumented HMMWV at the Operator Performance Laboratory at the University of Iowa). We also plan to refine the existing atom tasks, especially in the cognitive and visual areas. Alternative classification algorithms such as multi-label learning [21] will be evaluated to facilitate the process of resolving the conflicts between different workload types. The classifier will, finally, be validated on a much larger sample of subjects (target N = 150 subjects).

The ultimate PHYSIOPRINT workload assessment tool is envisioned as a flexible software platform that consists of three main components: (1) an executable that runs on a dedicated local (client) machine to acquire multiple physiological signals from one or more subjects, processes them in real time, and determines global and resource-specific workload on a fine time scale; (2) a large server-based database of physiological signals acquired during relevant atomic tasks from a large number of subjects with different socio-demographic and other characteristics (e.g., degree of driving experience); and (3) a palette of real-time signal processing, feature extraction, and workload classification algorithms. The platform will support a number of recording devices from a wide range of vendors (via the appropriate device drivers), and enable visualization of the workload measures. The users will essentially be able to build their own workload assessment methods from the available building blocks of feature extraction methods and implemented classifiers. Initially, the database will include 100–150 subjects, but we envision that the database will continue to evolve as the community grows in the following years.

References

Huey, F.M., Wickens, C.D.: Workload Transition: Implications for Individual and Team Performance. National Academy Press, Washington (1993)
Google Scholar
Gopher, D., Donchin, E.: Workload – an examination of the concept. In: Boff, K.R., Kaufman, L., Thomas, J.P. (eds.) Handbook of Perception and Human Performance: Cognitive Processes and Performance, vol. 2, pp. 41-1–41-49. Wiley, Oxford (1986)
Google Scholar
Jex, H.R.: Measuring mental workload: problems, progress, and promises. In: Hancock, P.A., Meshkati, N. (eds.) Human Mental Workload, pp. 5–39. Elsevier Science Publishers B.V., Amsterdam (1988)
Chapter Google Scholar
Wickens, C.D., Hollands, J.G.: Engineering Psychology and Human Performance. Prentice Hall, Upper Saddle River (1999)
Google Scholar
Eggemeier, F.T., Wilson, G.F., et al.: Workload assessment in multi-task environments. In: Damos, D.L. (ed.) Multiple Task Performance, pp. 207–216. Taylor & Francis, Ltd., London (1991)
Google Scholar
Wickens, C.D.: Engineering Psychology and Human Performance. HarperCollins Publishers, New York (1992)
Google Scholar
De Waard, R.: The Measurement of Driver’s Mental Workload, p. 198. Traffic Research Centre (now Centre for Environmental and Traffic Psychology), University of Groningen, Heran (1996)
Google Scholar
Farmer, E., Brownson, A.: Review of workload measurement, analysis and interpretation methods. Eur. Organ. Saf. Air Navig. 33 (2003)
Google Scholar
Castor, M.C.: GARTEUR Handbook of Mental Workload Measurement, p. 164. GARTEUR, Group for Aeronautical Research and Technology in Europe, Flight Mechanics Action Group FM AG13 (2003)
Google Scholar
Wilson, G.C., et al.: Operator Functional State Assessment, p. 220. North Atlantic Treaty Organization (NATO), Research and Technology Organization (RTO) BP 25, F-92201, Neuilly-sur-Seine Cedex, France, Paris (2004)
Google Scholar
Fahrenberg, J., Wientjes, C.J.E.: Recording methods in applied environments. In: Backs, R.W., Boucsein, W. (eds.) Engineering Psychophysiology: Issues and Applications, pp. 111–135. Lawrence Erlbaum Associates Inc., Mahwah (1999)
Google Scholar
Kramer, A.F.: Physiological metrics of mental workload: a review of recent progress. In: Damos, D.L. (ed.) Multiple Task Performance, pp. 279–328. Taylor & Francis Ltd., London (1991)
Google Scholar
Sirevaag, E.J., Stern, J.A.: Ocular measures of fatigue and cognitive factors. In: Backs, R.W., Boucsein, W. (eds.) Engineering Psychophysiology: Issues and Applications, pp. 269–287. Lawrence Erlbaum Associates Inc., Mahwah (1999)
Google Scholar
Wilson, G.F.: An analysis of mental workload in pilots during flight using multiple psychophysiological measures. Int. J. Aviat. Psychol. 12(1), 3–18 (2001)
Article Google Scholar
Berka, C., Levendowski, D., Ramsey, C.K., et al.: Evaluation of an EEG-workload model in an aegis simulation environment. In: Caldwell, J.A., Wesensten, N.J. (eds) Proceedings of SPIE Defense and Security Symposium, Biomonitoring for Physiological and Cognitive Performance during Military Operations, pp. 90–99. SPIE: The International Society for Optical Engineering, Orlando (2005)
Google Scholar
Berka, C., Levendowski, D., Davis, G., et al.: EEG indices distinguish spatial and verbal working memory processing: implications for real-time monitoring in a closed-loop tactical tomahawk weapons simulation. In: 1st International Conference on Augmented Cognition, Las Vegas, NV (2005)
Google Scholar
Mitchell, D.K.: Mental workload and ARL workload modeling tools. Final report. Army Research Laboratory, Aberdeen Proving Ground MD. ADA377300 (2000). http://handle.dtic.mil/100.2/ADA377300
Mitchell, D.K., Samms, C., Henthorn, T., Wojciechowski, J.: Trade Study: A Two- Versus Three-Soldier Crew for the Mounted Combat System (MCS) and Other Future Combat System Platforms, ARL-TR-3026. U.S. Army Research Laboratory, Aberdeen Proving Ground, MD (2003)
Google Scholar
Mitchell, D.K.: Workload analysis of the crew of the Abrams V2 SEP: phase I baseline IMPRINT model. Final Report. Army Research Laboratory, Aberdeen Proving Ground, MD. Human Research and Engineering Directorate. ADA508882 (2009). http://handle.dtic.mil/100.2/ADA508882
Task Force of the European Society of Cardiology and North American Society of Pacing Electrophysiology. Heart rate variability: standards of measurement, physiological interpretation, and clinical use. Circulation 93, 1043–1065 (1996)
Google Scholar
Tsoumakas, G., Katakis, I.: Multi-label classification: an overview. Int. J. Data Warehouse. Min. 3, 1–13 (2007)
Article Google Scholar

Download references

Acknowledgments

This work was supported by the Army Research Laboratory grant W91CRB-13-C-0007. The views, opinions, and/or findings contained in this article are those of the authors and should not be interpreted as representing the official views or policies, either expressed or implied, of The Army Research Laboratory or the Department of Defense.

Author information

Authors and Affiliations

Advanced Brain Monitoring Inc., Carlsbad, CA, USA
Djordje Popovic & Maja Stikic
Systems Technology Inc., Hawthorne, CA, USA
Theodore Rosenthal & David Klyde
Operator Performance Laboratory, University of Iowa, Iowa City, IA, USA
Thomas Schnell

Authors

Djordje Popovic
View author publications
You can also search for this author in PubMed Google Scholar
Maja Stikic
View author publications
You can also search for this author in PubMed Google Scholar
Theodore Rosenthal
View author publications
You can also search for this author in PubMed Google Scholar
David Klyde
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Schnell
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Djordje Popovic .

Editor information

Editors and Affiliations

Soar Technology Inc., Vienna, VA, USA
Dylan D. Schmorrow
United Technologies Research Center, East Hartford, CT, USA
Cali M. Fidopiastis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Popovic, D., Stikic, M., Rosenthal, T., Klyde, D., Schnell, T. (2015). Sensitive, Diagnostic and Multifaceted Mental Workload Classifier (PHYSIOPRINT). In: Schmorrow, D.D., Fidopiastis, C.M. (eds) Foundations of Augmented Cognition. AC 2015. Lecture Notes in Computer Science(), vol 9183. Springer, Cham. https://doi.org/10.1007/978-3-319-20816-9_11

Download citation

DOI: https://doi.org/10.1007/978-3-319-20816-9_11
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-20815-2
Online ISBN: 978-3-319-20816-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Abstract

1 Introduction

2 Methods

2.1 IMPRINT Workload Model

2.2 Study Design

2.3 Data Processing and Analyses

3 Results

3.1 Speech Workload Scale

3.2 Fine Motor Workload Scale

3.3 Gross Motor Workload Scale

3.4 Auditory Workload Scale

3.5 Visual Workload Scale

3.6 Cognitive Workload Scale

4 Discussion

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation