1 Introduction and Background

Practice related changes in brain activity have been reported in a variety of laboratory tasks and paradigms. Here we report preliminary results form a study aimed at investigating patterns of brain activity associated with practice and learning in a visual object recognition task. What distinguishes this study from similar ones is the use of real-world, complex stimuli (i.e., military aircraft) and a variety of measurements, such as accuracy, response time, eye fixations, pupil size, and fMRI BOLD signal. The purpose of this investigation is to inform the development of computational cognitive models (Anderson et al. 2007) that can be used to suggest instructional interventions that maximize learning and engagement and can be embedded in intelligent, adaptive tutoring systems. Anderson et al. (2010) have put forward a compelling argument for the value of combining behavioral and brain imaging data with computational cognitive modeling for the purpose of informing the development of intelligent tutoring systems.

The most robust result in the literature is that repeated practice is associated with reductions in activity in task-specific and task-general brain regions (Chein and Schneider 2005). However, this result applies only when there is a consistent mapping between stimuli and responses and no change in strategy is expected to occur with practice and learning. Increases in activity with practice are reported in paradigms that aim to develop skills (e.g. mirror reading, Poldrack and Gabrieli 2001) or capacities (e.g., working memory, Olesen et al. 2004). In these paradigms, strategy shifts are expected and even seen as a desirable effect of training. We expect our task to be more prone to strategizing than simpler laboratory tasks and conceive of strategy learning as an intrinsic component of skill acquisition. We expect participants to develop strategies to inspect stimuli and their features, search for relevant information, encode, keep, and retrieve information in/from long-term memory. These activities will likely be associated with changes in neural activity that can be detected with fMRI. Given the purpose of our study, we are not just interested in quantitative changes such as increases or decreases in activity with learning. Qualitative changes are more interesting because we can learn something about the structure of thought, for example, whether a participant changes strategies or allocates different resources to the task at hand. In addition, since our main purpose is to inform the development of cognitive models, the temporal dimension of task performance is very important. We assume that the brain reacts differently at different stages of skill acquisition. Thus, the question is not whether activation increases or decreases but rather when it increases and when it decreases.

A number of theoretically informed regions of interest (ROI) were defined based on the literature on neural correlates of practice and learning (e.g., Anderson et al. 2011; Borst and Anderson 2014; Supekar et al. 2013). These were brain regions associated with visual recognition (fusiform gyrus, middle occipital gyrus), manipulations of spatial representations (posterior parietal), storage and retrieval of declarative memories (hippocampus, prefrontal cortex), cognitive control (anterior cingulate), motor control (areas around the central sulcus), automaticity (basal ganglia), and workload (insula). We also used a control ROI that was known a priori to be insensitive to the experimental manipulation – the auditory cortex.

1.1 Tasks

In the experimental task (Fig. 1A), participants saw an aircraft image and had to select its name out of four options. There were 75 different aircraft images in total. The control task (Fig. 1B) was similar except it used familiar stimuli. There were 52 different control images in total.

Fig. 1.
figure 1figure 1

A (left side): The experimental task and B (right side): The control task

2 Method

2.1 Participants

Fifteen participants were recruited for this study from Wright State University’s undergraduate and graduate student population. Throughout the 10-week duration of the study, over 50 % of the participants dropped out of the study or were excluded for various reasons. This attrition rate is not uncommon for longitudinal studies.

2.2 Design

A pilot study was run with five participants to determine the length of the study necessary to achieve asymptotic performance in terms of both accuracy and response time for an average participant. Based on what we learned from the pilot study, we decided to include 10 sessions in the main study, one session per week, and insert the brain imaging sessions at weeks 2, 6, and 10. Table 1 shows a schematic of the design.

Table 1. Layout of the experimental design

2.3 Apparatus

Eye fixations and pupil size were recorded using a video-based eye tracker (EyeLink). Neuroimaging was performed using a 1.5 Tesla MR scanner (General Electric Excite HDX; General Electric, Milwaukee, Wisconsin) with an eight-channel head coil. Visual stimuli were projected onto a screen positioned at the foot end of the bore. A mirror affixed to the head coil enabled the participants to view the screen. A fiber-optic button response unit was used to record participants’ behavioral performance.

3 Results and Discussion

Here we present exploratory analyses and preliminary results, mainly descriptive statistics and visualizations. More detailed quantitative analyses, including inferential statistics, will be presented at the conference and in a subsequent journal paper.

3.1 Pilot Study Results

The purpose of the pilot study was to get a sense of how many sessions were needed in order to achieve asymptotic performance. In addition, we were interested to learn about the participants’ strategies and how they organized their learning in the aircraft task. Figure 2 shows how (A) accuracy and (B) response time changes as a function of session. It is not clear whether the average performance has reached an asymptote, particularly with regard to response time. Based on this observation we decided to increase the number of sessions to 10 to ensure solid learning and skill acquisition.

Fig. 2.
figure 2figure 2

(A) Accuracy (left plot) as a function of session. (B) Response time (right plot) as a function of session. Error bars are 95 % confidence intervals.

After task completion, we debriefed the participants with regard to their learning strategies. A variety of strategies were reported such as: directly associating aircraft shapes and names, finding distinctive features of aircraft to aid with distinguishing among similar aircraft, selecting most memorable part of a name (e.g., the word “sea”, animal names) and discarding apparently irrelevant parts (e.g., numbers), using features and labels to group aircraft in categories, look for logical associations (e.g., helicopters without wheels have “sea” in their name), give ad hoc names to particular shapes, etc. We concluded that the task is conducive to strategizing and performance in this task may be a function of how participants organize their learning (strategic learning and executive control) in addition to visual processing and memory per sec.

3.2 Behavioral Results

Extending the study to 10 sessions proved to be a good decision because performance continued to improve after session 6, although it did not reach the maximum possible level (100 % accuracy). Since the accuracy variable was somewhat bimodal in distribution, we chose to show descriptive statistics for two ad hoc groups that we call “fast learners” and “slow learners”, respectively. We show the two groups separately to illustrate an interesting difference in learning strategy (see also the section on eye tracking results); we do not claim that the two groups are statistically distinct.

Figure 3 shows how accuracy increases over the 10 sessions, starting from slightly above chance level (25 %) in session 1 and ending at almost ceiling level (100 %) in session 9 for fast learners. Figure 4 shows reductions in response time associated with practice in the two groups. Performance in the fMRI scanner (sessions 2, 6, and 10) tends to be lower than expected based on the learning trajectory (i.e., accuracy is lower and response time is higher in the scanner than out of scanner). This effect has been attributed to specific attentional and motor deficits caused by the scanner environment and the scanning procedure (Van Maanen et al. 2015).

Fig. 3.
figure 3figure 3

Accuracy by session and round for fast and slow learners: for each session, the training rounds are indexed numerically (1 to 3) and the testing round is indexed with the letter “T”. The shaded areas represent 95 % confidence intervals (Color figure online).

Fig. 4.
figure 4figure 4

Response time (in seconds) by session and round for fast and slow learners. Eye tracking results (Color figure online)

The tipping point of the separation between the two groups is around sessions 3 and 4, where accuracy of the fast learners increases significantly faster than that of the slow learners. The confidence intervals are also wider around this point, suggesting that the cause of separation between groups is also an important determinant of individual differences among learners. The response time pattern (Fig. 4) suggests an interesting strategic difference between the two groups: fast learners tend to be more deliberative; they spend more time inspecting the stimuli in the first 3 sessions than slow learners (see the section of eye tracking results for a corroboration of this interpretation).

Eye-tracking data provided additional insights into the learning strategies of slow and fast learners. We analyzed the amount of time each participant looked at each of the following five components of the interface: object image, correct name option, incorrect name options, feedback, and progress (i.e., image index and score). We grouped the eye tracking data in two classes based on when it occurred in a given trial: (1) before a response was made and (2) after feedback was provided. Figure 5 shows the eye fixation patterns for (A) fast and (B) slow learners in trials in which they gave correct responses. The interface components of interest are shown on the X-axis. Each vertical bar represents a behavioral session (note: eye tracking data was not collected in the fMRI sessions). We notice that fast learners look longer at the aircraft image than slow learners, particularly after a response was made. In contrast, slow learners look longer at feedback and progress regions. Another interesting observation is that fast learners tend to increase the time they spend looking at the object as they learn. This increase occurs around sessions 3 and 4, which coincides with the moment in which fast learners clearly differentiate themselves from slow learners. Arguably, the extra time in which fast learners inspect the object after a correct response was given is spent rehearsing and attempting to consolidate their memories by use of mnemonic strategies.

Fig. 5.
figure 5figure 5

Eye fixation patterns for fast and slow learners in trials in which a correct response was given.

We also analyzed changes in pupil diameter as a function of task and practice. Pupil dilation has been interpreted as an index of cognitive resources allocated to the task at hand (Granholm and Steinhauer 2004; Siegle et al. 2003). We assumed that pupil size of a given individual should be higher in the aircraft task than in the control task and took the difference between the two measures (here referred to as pupil dilation for brevity) to represent a normalized index of an individual’s resource allocation strategy. Figure 6A shows that pupil dilation is higher in fast learners and peaks around sessions 3 and 4. Paradoxically, in some sessions, slow learners show “negative” pupil dilation, that is, higher pupil size for control vehicles than for aircraft. This is obviously an inappropriate resource allocation strategy that may be responsible for the poorer performance observed in slow learners. Figure 6B shows that pupil dilation in session 1 strongly predicts aircraft accuracy over the entire experiment (r = 0.63, p = 0.04). Recall from Fig. 3 that aircraft accuracy in session 1 was similar in the two groups and close to chance level. This result suggests that the overall performance in this study was determined to a large extent by the participants’ willingness to allocate cognitive resources to the aircraft task at the outset of the study.

Fig. 6.
figure 6figure 6

A (left side): Pupil dilation by session (except fMRI sessions 2, 6, and 10) in fast learners (dark bars) as compared to slow learners (light bars). B (right side): Correlation between pupil dilation in session 1 and overall aircraft accuracy.

3.3 Region of Interest Analysis

Overall, across all brain regions of interest, in the control task, we see the usual pattern of activation reported in the literature: activation decreases with practice (Chein and Schneider 2005). In the aircraft task, we see a different pattern: activation increases in the learning phase from week 2 to week 6 and decreases in the consolidation phase from week 6 to week 10 (Fig. 7A). This distinction is even clearer in the activation extent data (Fig. 7B). These data represent to what extent (percentage) a region is activated above threshold.

Fig. 7.
figure 7figure 7

A (left side): Activation intensity by fMRI session (week). B (right side): Activation extent by fMRI session (week). Error bars represent 95 % confidence intervals.

We investigate further the distinction between the aircraft task and the control task in each region of interest (see Fig. 8). We see that the distinction is more pronounced in particular regions of interest and almost inexistent in others. The largest differences are seen in fusiform gyrus, middle occipital gyrus, and superior parietal lobule. These are areas involved in developing visual object representations and operating on them. Part of this effect can be a “set size effect” considering that there are much more aircrafts than control vehicles. More interesting are the differences seen in the inferior, middle, and medial frontal gyri thought to reflect retrieval and cognitive control operations.

Fig. 8.
figure 8figure 8

Activation intensity by brain region of interest in the aircraft and control tasks

Thus, there seem to be more than just quantitative differences between the two tasks. The participants seem to employ different strategies in the aircraft task as compared to the control task. The control task seems to be executed in a more automatic way, while the aircraft task is performed in a more controlled and deliberate manner.

Next we analyzed the dynamics of activation in the two tasks as the participants advanced in their learning. In the aircraft task, we notice increases in activation from week 2 to week 6 in caudate and hippocampus reflecting development of procedures (or rules) and declarative memories, respectively. The anterior cingulate cortex (ACC) also shows an increase in activation at week 6. This probably reflects the increased conflict and interference that occur with learning to distinguish between somewhat similar objects and features. The control task shows different dynamics. Activation is much lower in magnitude to start with and decreases with practice. There is not much change of activation in areas related to memory and strategy development or deployment (ACC, caudate, hippocampus, and frontal gyri), reflecting the high level of automaticity reached by this task.

3.4 Whole-Brain Exploratory Analysis

A theoretically agnostic exploratory analysis was performed to uncover potential activation patters that were not found in previous research or may be specific to our task and experimental setup. We identified all clusters of activity (i.e., adjacent voxels activated above threshold) as a function of task (aircraft vs. control) and session (weeks 2, 6, and 10). For each cluster, we determined its size (i.e., number of voxels), identified all the local maxima, and listed the brain regions that these maxima belonged to. The overall pattern of changes in brain activity is consistent with the one observed in the ROI analysis (see Sect. 3.4).

With regard to the content of the clusters, the general trend that emerges across individuals is as follows. The first cluster tends to capture visual and representational areas in the occipital, temporal and parietal lobes, and cerebellum. The second cluster and the subsequent ones capture frontal areas (inferior frontal gyrus, middle frontal gyrus, superior frontal gyrus, and cingulate gyrus), motor areas (precentral and postcentral gyri), and subcortical areas (e.g., thalamus, basal ganglia). In the aircraft task, we see a qualitative shift from week 2 dominated by occipital areas to week 6 and week 10 showing the emergence of the fusiform gyrus and other areas in the temporal lobe (inferior and middle temporal gyri) reflecting complex object representation and memory. Frontal and parietal areas involved in cognitive control, interference resolution, attention, and memory retrieval tend to be activated throughout the entire study in the aircraft task. In the control task, at week 2 the pattern looks much like the aircraft task, but by week 6 we see the emergence of medial frontal gyrus, possibly indicating a strategy shift. By week 10, we see the emergence of posterior parietal areas and the decline of frontal and temporal involvement, which suggests that by week 10 the strategy might be based on direct representations bypassing memory and control.

3.5 Correlations of Brain Activation and Behavioral Performance

We examined correlations between brain activity in our predefined regions of interest and behavioral performance (i.e., accuracy in identification of aircraft images). Most of these correlations were not statistically significant due to the very low number of participants (7). However, an interesting pattern emerged from the analysis of the correlations from all regions of interest (Fig. 9). The correlations tend to be negative at week 2, turn to positive at week 6, and decrease significantly approaching zero at week 10. This pattern is similar for both activation intensity (Fig. 9A) and activation extent (Fig. 9B).

Fig. 9.
figure 9figure 9

Pattern of correlations between brain activation (A, left side) intensity and (B, right side) extent and task performance (aircraft accuracy) at start, midpoint, and end of study. Error bars represent 95 % confidence intervals.

The negative correlations at week 2 can be interpreted as indicating higher and perhaps inefficient metabolic expenditure for low performers and relatively lower metabolic expenditure for high performers. This can also be associated with higher anxiety or stress for low performers. By week 6, all participants have significantly increased their accuracy. Their brain activity at this level could reflect how much of relevant brain resources participants recruit for the task. The participants who are able to recruit more of these resources perform better, which explains the positive correlation. By week 10, most participants reach a high level of performance (around 80 %), the task has become less challenging for everybody, which explains the decrease in the magnitudes of the correlations. Thus, the participants who gradually recruit more brain resources as they learn the task tend to perform better. When their performance reaches a high level, they may decrease the level of metabolic expenditure. The participants who show relatively higher levels of brain activity and relatively lower performance in the early stages of learning tend to learn at a slower rate.

4 Conclusion

The preliminary analyses reported here suggest that learning progresses differently in the two tasks. The control task reaches high levels of automation, which is associated with decreases in most brain regions. The aircraft task recruits more brain resources as the participants learn to master it, particularly from brain regions involved in memory, representation, and control. Learners would benefit from a tutoring process aimed at aiding memory encoding and retrieval, making correct associations, and resolving ambiguity and interference among representations and associations.