Keywords

1 Introduction

Assessment of mental workload directly from the brain with functional neuroimaging techniques has been an active field of research for many years [1]. Monitoring the human operator’s workload levels are desirable for preventing under- and overloading that can increase human errors, and hence potentially improve working efficiency by maintaining a balanced workload level. An early demonstration of using brain measures to assess working memory has been reported by Gevin et al. [2] in which EEG data from subjects across three days were collected during which time they performed n-back tasks that induced two memory load levels. Artificial neural networks were adopted to recognize patterns in the EEG signal for mental workload level classification [2]. Since then, mental workload classification has been investigated by numerous researchers using various neuroimaging techniques [3, 4], for different types application areas [5, 6], with many signal processing approaches [7] and across affective contexts [8].

Although promising, previous mental workload classification results did not take in to account the effect of practice time on the neurophysiological signals and how it would affect the classification results. Indeed, the dynamics of the acquired signals with time were either not considered, or were controlled by allowing sufficient training time before data acquisition so that the behavioral performance reached an asymptotic level [2]. However, several studies have reported a significant effect of practice time on the neurophysiological signals of subjects performing mental tasks [4, 911]. These studies suggested that the brain adapts with each practice session and the effect of practice time should not simply be ignored. In the interim, if the task is complex, it may not be feasible to allow enough practice time so the subjects are ‘trained’. Additionally, the mental workload level of novices who are learning a new task is also of interest in some studies [4] and practicing before the recording session should not be allowed in this case.

In this study, we investigated the brain’s adaptation to a working memory task using a multimodality approach. This work is part of a larger project in which the ultimate goal is to investigate mental workload classification with concurrent EEG and fNIRS for building a passive brain-computer interface (BCI). The multimodality approach offers two fundamental benefits over traditional single modality methods. First, EEG and fNIRS are highly complementary techniques. EEG records scalp electrical activities and has a time precision of micro-second level but is spatially blurred due to the volume conductive effect. EEG also has EOG artifacts especially in the prefrontal areas. fNIRS, on the other hand is an optical brain imaging technique that measures the hemodynamic response in the cortical areas. It provides measures that are closely related to functional Magnetic Resonance Imaging (fMRI) [12] with enhanced ecological validity. fNIRS has been adopted to study working memory [13] and the effect of practice time on working memory inducing tasks [4]. As well, fNIRS has the advantage of spatial specificity and it is immune to EOG artifacts however fNIRS has low time resolution and is sensitive to head orientation. Second, in addition to the complimentary information that they can provide, estimating the correlations between EEG and fNIRS can potentially reveal new biomarkers which cannot be accessed using only a single modality. This is evidenced by the concurrent fMRI-EEG studies in the literature [14]. Previously, we adopted concurrent EEG-fNIRS to study performance monitoring for improving a P300 BCI [15]. Recently, EEG-fNIRS has been adopted to improve the classification of motor imagery [16], mental workload [17], and the behavioral performance of a working memory task [18].

The objective of this preliminary work is twofold. First, to test the efficacy of multimodal assessment of working memory practice using concurrent EEG-fNIRS setup and verify the results of previous work [4, 9, 10]. Specifically, we would expect an increase in alpha/theta activity and a decrease in blood oxygenation with increased practice time. Second, is to provide some initial insights into the relation between EEG and fNIRS signals. The focus of assessing the relation between EEG and fNIRS is to identify how these modalities interact with the two experimental factors of memory load and practice time as we develop the next stage objective of mental workload classification under the influence of practice time.

2 Materials and Methods

2.1 Participants

Sixteen healthy individuals (six female) volunteered for the study. All participants were right handed (LQ = 80 ± 18, Decile = 6.3 ± 3.1, mean ± SD) and aged between 18 and 30 years (mean ± SD = 22 ± 3). One participant was rejected from the study because the fNIRS data was not recorded. Two more participants were excluded from the study due to excessive motion artifacts. All participants gave written informed consent approved by the institutional review board of Drexel University prior to their involvement in the study.

2.2 Recording

We simultaneously recorded EEG and fNIRS from subjects while they were performing the n-back tasks. Figure 1 shows a schematic of the recording setup.

Fig. 1.
figure 1figure 1

Schematic of recording setup

EEG was recorded using a Neuroscan Nuamp amplifier from 28 locations according to the International 10-20 system. The two prefrontal sites Fp1 and Fp2 were not recorded due to the placement of fNIRS sensor over the forehead. Two additional electrodes, one placed below the left eye and the other placed at the right outer canthus were used to record electrooculography (EOG) activities. All 30 channels (28 EEG + 2 EOG) were referenced to the right mastoid, digitally sampled at 500 Hz and low-pass filtered at 100 Hz for analysis.

Prefrontal fNIRS were recorded using COBI Studio [19] from a 16-optode continuous wave fNIRS system developed at Drexel University [4] and manufactured by fNIR Devices, LLC. The sampling rate was 2 Hz. To ensure repeatable sensor placement, the center of the sensor was aligned to the midline and the bottom of the sensor was touching the participant’s eye brow.

The EEG and fNIRS signals were synchronized offline using stimulus triggers that have been sent from the stimulus presentation software BCI2000 [20] to the EEG and fNIRS data acquisition devices during recording.

2.3 Experimental Paradigm

Participants sat comfortably facing a LED screen which was placed at about 70 cm in front of them. Letter targets (1.7° visual angle) were presented on the center of the screen. The stimulus duration was 500 ms and the inter-stimulus interval was 2000 ms. The sequence of the stimuli was pseudo randomized so that no letters appeared more than twice in succession.

There were three experimental conditions: 0-back, 1-back and 2-back. One n-back block timeline is depicted in Fig. 2. The instruction period informed the subject which task (0-, 1-, or 2-back) to perform. During the task period, 16 letters (5 targets) would be shown to the participant on a screen in pseudo random order. For 0-back condition, letter ‘X’ was the target, while other letters were non-targets. For 1-back and 2-back conditions, a letter was the target if it was shown in the previous screen and two screens back, respectively. Subjects were instructed to press a key as soon as they identified a target letter. During the fixation period, subjects were instructed to fix their eye gaze on a white cross and do nothing which allowed the fNIRS signals to return to baseline levels.

Fig. 2.
figure 2figure 2

Time line of an n-back block

The entire experiment included four sessions while each session encompassed four repetitions. One repetition included three n-back blocks, one from each of the conditions. Hence, there were 48 n-back blocks for the entire experiment, 16 from each condition. The order of the blocks was randomly shuffled so that no condition was repeated twice in succession within a session. One session would typically take approximately 12 min to complete. Subjects took a 5 min break between sessions. The entire recording time was about 60 min. Figure 3 displayed the protocol outline.

Fig. 3.
figure 3figure 3

Protocol outline

2.4 fNIRS Analysis

The raw light intensities were first visually inspected to reject problematic optodes over the hairline or in areas that had non-usable contacts. In this manner, optode 1 was rejected from seven subjects, optode 15 was rejected from six subjects and optode 14 was rejected from one subject (See Fig. 4 for optode configuration). We then converted the raw light intensities to oxygenation (ΔOxy) and total hemoglobin (ΔHbt) concentration changes by applying the modified Beer-Lambert law [21]. The ΔOxy and ΔHbt signals were low-pass filtered at 0.1 Hz using a 20-order finite impulse response (FIR) filter. The fNIRS epoch of each n-back block from the onset of the first stimulus to 2 s after the onset of the last stimulus (40 s time window) were then extracted. The fNIRS epochs were further baseline corrected with respect to the start of each epoch. To reduce sample size and the noise effect, the average activation from four areas were calculated: left lateral (LL), left medial (LM), right medial (RM) and right lateral (RL). Each area included four optodes as shown in Fig. 4 below.

Fig. 4.
figure 4figure 4

The four prefrontal areas: left lateral (LL), left medial (LM), right medial (RM) and right lateral (RL).

A 4 × 4 × 3 [Area (LL; LM; RM; RL) × Session (1−4) × Load (0-back; 1-back; 2-back)] ANOVA with repeated measures on all factors was applied for analysis. A similar approach was adopted in previous cognitive-motor executive function adaptation studies [22]. The \( \eta_{p}^{2} \) effect sizes were calculated to assist in data interpretation. To assess the effect of practice time, we fitted linear models with session as the independent variable and ΔOxy/ΔHbt as the dependent variable. A separate model was fitted for each load condition, brain area and subjects. The group effect of the slopes of fitted models were tested employing a Wilcoxon signed rank test using one subject as one sample (N = 13). A significance criterion α = 0.05 was used for all tests. For the linear effect of practice time, false discovery rates (FDR) [23] was applied to control for Type I error.

2.5 EEG Analysis

EEGs were band-pass filtered 0.1−50 Hz applying zero-phase infinite impulse response (IIR) filter, epoched from −500 ms to 1500 ms and baseline corrected with respect to the 500 ms prior to the stimulus onset at 0 ms. We adopted a threshold based approach to reject artifact contaminated epochs. Wrongly responded epochs, i.e. key-press responded to a non-target and non- responses to a target were also rejected from analysis. Taken together, about 19 % epochs were rejected.

The periodogram of each EEG epoch was estimated with application of a Hann window. The average power spectral of all correctly responded blocks of a session for each memory load condition was calculated for analysis. For this preliminary study, we only considered EEG channels along the midline.

A 6 × 4 × 3 [Channel (Fz; FCz; Cz; CPz; Pz; Oz) × Session (1−4) × Load (0-back; 1-back; 2-back)] ANOVA with repeated measures on all factors was applied for analysis. The \( \eta_{p}^{2} \) effect sizes were calculated to facilitate result interpretation. As with fNIRS, we fitted linear models with session as the independent variable and alpha/theta power as the dependent variable. A separate model was fitted for each load condition, channel and subjects. The group effect of the slopes of fitted models were tested employing a Wilcoxon signed rank test using one subject as one sample (N = 13). For the linear effect of practice time, false discovery rates (FDR) [23] was applied to control for Type I error.

3 Results

3.1 Behavioral Measures

We considered two behavior measures: (1) Target accuracy, which is the percentage of correctly responded targets; and (2) Key-press delay, which is the time delay from stimulus onset to key-press response for a correctly responded target. The measures were standardized across load and session within each subject before analysis. Figure 5 showed the average behavior performance. A 4 × 3 [Session (1−4) × Load (0-back; 1-back; 2-back)] repeated measures ANOVA revealed a significant main effects of Load for target accuracy (F(1.14,13.69) = 6.22, p < 0.05, \( \eta_{p}^{2} \) = 0.34) and key-press delay (F(1.38,16.55) = 13.59, p < 0.001, \( \eta_{p}^{2} \) = 0.53). Target accuracy was 98.0 % (SD = 2.5) for 0-back, 97.2 % (SD = 2.9) for 1-back and 94.0 % (SD = 5.8) for 2-back condition. And the key-press delay was 450 ms (SD = 54) for 0-back, 450 ms (SD = 88) for 1-back and 549 ms (SD = 145) for 2-back condition. No significant effect has been found for Session.

Fig. 5.
figure 5figure 5

Average behavior performance of task load across Sessions (error bars are standard error of the mean).

3.2 fNIR Measures

The average fNIRS measures for each task load condition across sessions are shown in Fig. 6.

Fig. 6.
figure 6figure 6

Average fNIRS activations (and standard error) after subject-wise standardization. LL – left lateral; LM – left medial; RM – right medial; RL – right lateral.

ΔOxy. The 4 × 4 × 3 ANOVA revealed a significant interactions of Area × Load (F(6, 72) = 4.42, p < 0.001, \( \eta_{p}^{2} \) = 0.27), and Session × Load (F(6, 72) = 3.28, p < 0.01, \( \eta_{p}^{2} \) = 0.21) and a significant main effect of Load (F(2, 24) = 6.34, p < 0.01, \( \eta_{p}^{2} \) = 0.35). Pairwise comparison revealed a significant lower medial prefrontal activation in 2-back compared to 1-back and 0-back conditions (See Table 1 for detail). For 2-back condition, a significant lower activation in Session 4 was detected when compared to Session 1 (t(12) = 4.68, adjusted p < 0.05). No significant session-wise differences have been found for 1-back and 0-back conditions. Wilcoxon signed rank test showed a significant linear fit for 2-back only (See Fig. 7). The topography of prefrontal activations were shown in Fig. 8.

Table 1. Significant pair-wise comparison results (Area × Load)
Fig. 7.
figure 7figure 7

Linear fit results for ΔOxy as a function of practice time in the 2-back condition. Each line represented the linear fit of a subject (N = 13). The p-values shown here were FDR adjusted across the four brain areas and the three workload levels. The median coefficient of determination (\( R^{2} \)) across all subjects was shown in addition. No significant results have been found for 0-back, 1-back and ΔHbt.

Fig. 8.
figure 8figure 8

ΔOxy topography plots. Data were standardized subject-wise across all conditions and sessions before averaging.

Alpha. The 6 × 4 × 3 ANOVA revealed a significant interaction Session × Load (F(6, 72) = 3.87, p < 0.05, \( \eta_{p}^{2} \) = 0.21) and main effect of Session (F(3, 36) = 4.27, p < 0.05, \( \eta_{p}^{2} \) = 0.30), Load (F(2, 24) = 47.67, p < 0.001, \( \eta_{p}^{2} \) = 0.83). Post hoc contrasts revealed a significant linear trend for both Session (F(1,10) = 19.64, p < 0.001) and Load (F(1,10) = 56.08, p < 0.001). Alpha amplitudes increased with increasing session number and decreased with increasing memory load as shown in Fig. 9. Pair-wise comparison revealed a significantly lower alpha activation in session 1 compared to session 4 (t(12) = 5.04, adjusted p < 0.01). Wilcoxon signed rank test showed a significant linear fit for all memory load conditions (See Table 2). We further compared the coefficient determination (\( R^{2} \)) values of the fitted linear model across the load conditions with Wilcoxon signed rank test. Results showed in Pz a significantly higher \( R^{2} \) in 2-back compared to 0-back conditions (adjusted p < 0.005).

Table 2. Linear fit results for alpha activity as a function of practice time. Results showed are the median coefficient of determination across all subjects. Significant results (FDR-adjusted p < 0.05) were marked with asterisk. No significant results have been found for theta activity.
Fig. 9.
figure 9figure 9

Average alpha and theta band power (and standard error) after subject-wise standardization.

Fig. 10.
figure 10figure 10

Topography of alpha power (created with EEGLAB [24]). Alpha powers were standardized subject-wise across all conditions and sessions before averaging.

Theta. The 6 × 4 × 3 ANOVA revealed a significant interaction Channel × Load (F(10, 120) = 3.05, p < 0.05, \( \eta_{p}^{2} \)  = 0.23). However, no pair-wise comparison results passed the significant threshold (FDR-adjusted p < 0.05) and no significant linear trend have been found.

4 Discussion

Our main results showed that EEG alpha power decreased with higher workload level (2-back compared to 0 and 1-back) and on the other hand, increased with practice time (throughout the task period). These results are in agreement with the previous work [10]. Theta activity, however, showed only a significant interaction between memory load and channel. This might be due to the fact that our n-back task was not challenging enough for the subjects as evidenced by the high accuracy achieved for target detection (96.4 % overall and 94.0 % for the 2-back condition). For the same reason, the behavioral performance was affected by workload level but not affected by practice time. The fNIRS oxygenation and total hemoglobin levels were also compared and results for the first session of the protocol is aligned with previous reports in which increased activation is observed with increased task load [3, 4].

Interestingly, a significant decrease in the 2-back related activation was observed across the sessions from the beginning to the end of the recording period although there was no change in behavioral performance. These findings suggest that an adaptation to the current task took place with the recording session as the EEG alpha power also increased. There was no change in cortical oxygenation measures for 0- and 1-back as they are extremely low task load conditions. The decrease in the cortical activation of 2-back task with practice time leads to a reversal of the load related activation pattern when compared to the beginning session. These reversals of load activation patterns might be due to a combination of effects from adaptation and task difficulty. We also need to take into consideration that the current protocol is longer than many previous experiments. Since the task was not challenging enough for the subjects, one explanation is that the effect of practice time is dominated by the cortical oxygenation. Another possibility is that after practicing the task for some time, more effort might be required by the subjects to focus on the easier 0-back and 1-back task compared to 2-back due to boredom.

The practice time effect we observed in the current study was also presented in a previous fNIRS-based n-back classification study [3]. A similar decrease in cortical oxygenation and also in total hemoglobin levels can be seen for the most highest load condition (3-back) in the dataset published by authors (See Fig. 11).

Fig. 11.
figure 11figure 11

Average fNIRS of data collected in [3]. The change of 3-back activation pattern over time showed similar decreasing trend as reported in the current study. Data were baseline corrected respect to the start of each block and standardized across block and conditions (1-, 2-, 3-back) for each optode. The four region of interests were defined as: LL – Channel 7 and 8; LM – Channel 5 and 6; RM – Channel 1 and 2; RL – Channel 3 and 4.

From the effect size estimation, we can see that mental workload level has the largest effect on alpha power (\( \eta_{p}^{2} \) = 0.83) followed by key-press delay (\( \eta_{p}^{2} \) = 0.53), ΔHbt (\( \eta_{p}^{2} \) = 0.42), ΔOxy (\( \eta_{p}^{2} \) = 0.35) and target detection accuracy (\( \eta_{p}^{2} \) = 0.34). This suggested that alpha power is better than behavioral performance metrics for characterizing mental workload, especially when there is a ceiling effect in the performance. The fNIRS measures provided an effect larger than target detection accuracy and may provide valuable information for mental workload classification. Furthermore, we observed a large practice time effect on alpha power (\( \eta_{p}^{2} \) = 0.30) and a large interaction effect between load and practice time on alpha power (\( \eta_{p}^{2} \) = 0.21) and ΔOxy (\( \eta_{p}^{2} \) = 0.21) which suggested that these effects cannot be simply ignored.

Our study is aligned with previous EEG and fNIRS works and demonstrated that the effect of brain adaptation with practice time can be observed in both types of brain signals [4, 10]. The increased alpha activity and decreased cortical oxygenation in the 2-back condition suggested lower brain activation and the brain’s tendency toward the idle state with increased practice time. It should be noted there are many other studies that showed different brain activation patterns before and after practicing. In [25], the authors categorized these patterns into four classes: (1) increased brain activation after practicing; (2) decreased brain activation after practicing; (3) a combination of increased and decreased activation, suggesting a functional reorganization [9] of brain activity; and (4) changed brain activation areas. In the same paper, the authors concluded that the exact reason for these inconsistent patterns is still unknown but could be caused by certain factors such as task complexity, task load demand and the length of practice time which can range from 20 min to two months. Other factors such as fatigue and boredom need to be considered.

In summary, these preliminary results suggest that when performing mental workload classification using neurophysiological signals, the effect of practice time needs to be carefully examined. Ignoring practice time may result in a degraded classification accuracy. Related to the practice time effect is work regarding a substantial number of machine learning and EEG classification studies that investigated how to compensate for the so called ‘covariate shift’ [26]. These studies may shed light on how to control for the practice time effect we observed in this study. Further studies are required to investigate the brain activation pattern changes throughout the task and to determine how this information can inform mental workload classification.