ATHENA – A Zero-Intrusion No Contact Method for Workload Detection Using Linguistics, Keyboard Dynamics, and Computer Vision
- 3 Citations
- 1.8k Downloads
Abstract
We describe preliminary evaluation data for ATHENA (Appraisal of Task Health and Effort through Non-intrusive Assessments), a completely no contact, zero-intrusion workload measurement method which harnesses multimodal metrics (e.g. linguistic markers, keyboard dynamics and computer vision). Preliminary results reflect the existence of different types of workload, with our zero-intrusion metrics demonstrating respectable classification accuracies when the variable causing workload (e.g. time) is matched with the type of workload assessed (e.g. temporal). By not requiring extra equipment or interrupting workflow, ATHENA represents a valuable step forward in providing automated workload support tools as well as a tool for understanding the workload concept.
Keywords
Human factors Zero-Intrusion workload measure Linguistic analysis Cognitive workload Machine learning Keyboard dynamics1 Introduction
The current work discusses the initial validation and preliminary results for ATHENA (Appraisal of Task Health and Effort through Non-Intrusive Assessments); a workload sensor with the ability to automatically assess and evaluate human workload. Once workload level is known performance can be optimized through adaptable automation [1] and task scheduling. Our machine learning enabled software sensor uses a variety of human behavioral features (such as linguistic analysis, keyboard dynamics and computer vision), all obtained with zero-intrusion and at little cost since the underlying behaviors contributing to the metrics are naturally exhibited during task completion.
ATHENA is ideally suited for NASA’s expected long duration space missions as well as other high criticality domains due to its zero-intrusion nature and use of a variety of metrics. By collecting naturally occurring behavioral metrics as well as information obtained through no contact sensors, ATHENA allows workload estimates to be obtained without modification or interruption of workflow, which can affect the crew’s workload and confound results (as seen with self-reports or additional equipment attached to the operator [2]). The variety of metrics allow an appropriate subset to be applied as the current context allows. This is important given the wide variety and multimodal nature of tasks will make some metrics useful during some tasks but not others (e.g. keyboard dynamics are not useful if typing is not occurring). We have shown a subset of our metrics can provide accurate classification, but the best classification rates are obtained when all available metrics are used [3].
2 Materials and Methods
2.1 Surveys
Ground truth workload data was obtained through surveys administered after the completion of each game. We used the Bedford Scale as a uni-dimensional rating scale to measure spare mental capacity [4]. The hierarchical scale guides users through a ten point decision tree, with each point having an accompanying descriptor of the workload level. For classification purposes we divided the Bedford into 4 levels following natural divisions provided by the scale itself: 1–3, 4–6, 7–9, and 10. We also used the NASA-TLX as a multi-dimensional rating scale to provide additional diagnostic information about experienced workload [5]. We divided the TLX into three levels such that 33 % of the data fell into each of a high, medium, and low category. This was done to provide us with the discrete categories needed for classification and maintain the nature of the TLX as a way to determine relative workload levels.
2.2 Procedure
ATHENA pilot test conditions. Condition 1: Baseline for all comparisons. Additional conditions used: Mental workload 2 & 3, Temporal workload 5 & 6, and noise 4.
We developed proprietary software to collect and analyze keyboard and mouse dynamics, and augmented techniques available in open source software to derive heart rate using the RGB video stream. The collected data was processed to obtain desired metrics such as heart rate [8], typing pauses and errors [9], as well as task performance [10]. Metrics were chosen based on a literature review and internal brainstorming.
3 Results
We used Simple Linear Regression as our supervised machine learning approach, via an interface with the WEKA toolkit [11], to classify each game played. Each participant played six games, each game divided into thirds for analysis. We performed 10-fold cross-validation using the survey scores as classification targets. We expected the total TLX to best classify all games, the Bedford and TLX mental subscale to best classify our Mental Low/Baseline/High conditions, and the TLX temporal subscale to best classify our Baseline/25/45 time limits. Noise was included as an exploratory variable.
Classification accuracies, with larger shapes indicating greater overall accuracy
Classification accuracies of TLX subscales, with larger shapes indicating greater overall accuracy.
4 Summary and Conclusions
Survey results for different game conditions
Overall, ATHENA has demonstrated that accurate assessments of workload can be achieved by a sensor that solely utilizes zero-intrusion metrics. Thus, ATHENA represents a valuable step forward in providing for automated workload support tools that can be used on long-duration space missions as well as a tool for understanding the workload concept.
Notes
Acknowledgments
Concepts described above were developed with support from US AF (Contract # FA8650-06-C-6635), NIST (Contract # 70NANB0H3020), ONR (Contract # N00014-09-C-0265) and NASA (Contract # NNX12AB40G). ATHENA was sponsored by NASA SBIR (Contract # NNX15CJ18P), undertaken by SIFT, LLC. We would like to thank Mai Lee Chang, Kristina Holden, Brian Gore, Gordon Voss, Aniko Sandor, Alexandra Whitmire, and Mihriban Whitmore for oversight, guidance, and support.
References
- 1.Miller, C.A., Funk, H., Goldman, R., Meisner, J., Wu, P.: Implications of adaptive vs. adaptable UIs on decision making: Why “automated adaptiveness” is not always the right answer. In: Proceedings of the 1st International Conference on Augmented Cognition, Las Vegas (2005)Google Scholar
- 2.Chen, F., Ruiz, N., Choi, E., Epps, J., Khawaja, M.A., Taib, R., Yin, B., Wang, Y.: Multimodal behavior and interaction as indicators of cognitive load. ACM Trans. Interact. Intell. Syst. (TiiS) 2(4), 22 (2012)Google Scholar
- 3.Wu, P., Ott, T., Paullada, A., Mayer, D., Gottlieb, J., Wall, P.: Inclusion of linguistic features to a zero-intrusion workload assessment technique. In: Proceedings of the 7th AHFE Conference, 27–31 July 2016. CRC Press, Inc. (accepted)Google Scholar
- 4.Roscoe, A.H: Assessing pilot workload in flight. In: AGARD Conference Proceedings Flight Test Techniques, Paris (1984)Google Scholar
- 5.Hart, S.G., Staveland, L.E.: Development of NASA-TLX (Task Load Index): Results of empirical and theoretical research. In: Hancock, P.A., Meshkati, N. (eds.) Human Mental Workload. North Holland Press, Amsterdam (1988)Google Scholar
- 6.Meirowitz, M.: Mastermind (1970)Google Scholar
- 7.van Someren, M.W., Barnard, Y.F., Sandberg, J.A.C.: The Think Aloud Method: A Practical Guide to Modelling Cognitive Processes. Academic Press, London (1994)Google Scholar
- 8.Miller, S.: Literature review workload measures. Document ID: N01-006. National Advanced Driving Simulator. http://www.nads-sc.uiowa.edu/publicationStorage/200501251347060.N01-006.pdf (2001). Accessed 7 Jan 2014
- 9.Vizer, L.M., Zhou, L., Sears, A.: Automated stress detection using keystroke and linguistic features: An exploratory study. Int. J. Hum Comput Stud. 67(10), 870–886 (2009)CrossRefGoogle Scholar
- 10.Tsang, P.S., Vidulich, M. A.: Mental workload and situation awareness. In: Handbook of Human Factors and Ergonomics (2006)Google Scholar
- 11.Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. SIGKDD Explor. 11(1), 37–57 (2009)CrossRefGoogle Scholar