Introduction

Determining hemispheric language dominance is essential during the preoperative planning of neurosurgical resection to minimize the risk of postoperative verbal deficits. The clinical gold standard for lateralizing language function in neurosurgical planning is the Wada test (intracarotid amobarbital test) [1]. However, due to its invasive nature and the effect of ultrashort lasting barbiturates, conclusions from the Wada test are sometimes uncertain. Therefore, alternative non-invasive methods, such as functional magnetic resonance imaging (fMRI), have been utilized. Selecting the optimal combination of tasks for clinical language fMRI and the interpretation of the results are, however, not straightforward.

Several different language tasks have been used in fMRI, as reviewed by Engström et al. [2]. The hemispheric dominance of language functions in fMRI has usually been assessed by using the laterality index (LI), which is relatively concordant with the Wada test [35], although differences have been reported as well [6]. However, the limitation of fMRI for lateralizing language function in clinical practice is that fMRI reveals not only essential but also contributory areas in language processing. Previous studies have shown that a combination analysis of several language tasks is more robust than a single task for the detection of essential language areas, and also less sensitive to supplementary areas [711]. Furthermore, LIs based on a combination of several tasks are shown to be more congruent with the Wada test or electrocortical stimulation [12, 13]. However, different tasks have seldom been compared quantitatively and thus there is no consensus regarding which tasks should be used in the clinical setting.

In a recent review article, criteria for assessing the usefulness of fMRI language paradigms were defined [14]. According to the criteria, the paradigm should (1) lateralize to the left hemisphere in healthy, right-handed adults, (2) create robust activation, (3) be in concordance with the lateralization measured with other techniques, such as the Wada test, and (4) produce activations in particular target area depending on the surgical target.

In this study, we evaluated five language tasks with healthy volunteers to determine the optimal combination of tasks for clinical use. According to the aforementioned criteria, an optimal combination should cover different aspects of language processing and detect the essential language areas robustly. Furthermore, since our subjects were healthy volunteers, the optimal combination should lateralize to the left hemisphere especially in the right-handed individuals. For this purpose, the task variability in activation power and LI of different tasks and their combinations was examined. In addition, to assess clinical feasibility, the laterality of language functions based on activation maps was assessed by an experienced clinical neuroradiologist.

Materials and methods

Subjects

A total of 20 healthy subjects (11 females; mean age 36 years, range 18–59) participated in this study. Subjects were recruited from the partners of epilepsy patients and from the university and hospital staff. The majority of the subjects were right-handed (n = 16), two were left-handed, and two were ambidextrous (the Edinburgh Handedness Inventory, [15]). Detailed subject demographics are shown in Table 1. The study was approved by the local ethics committee and the subjects gave their written informed consent.

Table 1 Subject demographics

fMRI tasks

Five different fMRI tasks were used: word generation (WGEN) [1619], letter task (LET) [11, 20], responsive naming (RNAM) [6, 12, 21], sentence comprehension (SCOMP) [3, 22, 23], and word pair (WP). The tasks were chosen so that the brain regions generally thought to be specialized in different aspects of language processes were activated as widely as possible. Furthermore, both auditory and visual stimuli were included to ensure that suitable language tasks were also available for patients with poor eyesight or hearing problems. A conventional block design was used in all tasks. Details of the task paradigms are presented in Table 2.

Table 2 Structures of the task paradigms

In the WGEN task, the active condition was to covertly generate different words starting with a given letter. In the control condition, subjects were shown a fixation cross and were instructed to stop generating words and instead perform a simple finger tapping task with the right and left index fingers.

In the LET task, line drawings of common objects were presented and the task was to press one of two buttons depending on whether the name of the presented object contained a phoneme [i] or not. The control condition consisted of abstract line drawings with smaller images that either were (press the “yes” button) or were not (press the “no” button) a part of the larger image.

In the RNAM task, written descriptions of common nouns (“a long yellow fruit”) were presented. The subjects were instructed to read the description and to covertly name it (“a banana”). The control condition was to look at varying rows of hashes (###### ##).

In the SCOMP task, short spoken sentences, either semantically correct or incorrect, were played to the subjects through headphones. The task was to decide whether the sentence was correct or not, and to press one of two buttons accordingly. The control condition consisted of short series of tones; the subjects were instructed to press the “yes” button if there were two high pitch tones in the series and the “no” button if there were none or only one high pitch tone.

In the WP task, subjects heard pairs of spoken nouns and their task was to covertly name the category to which both of the nouns belonged (i.e., “a ring” and “a bracelet” would both belong to the category “jewelry”). In the control condition, subjects heard pairs of tones and indicated by a button press whether the last tone was higher than the first one.

The Presentation software package (Neurobehavioral Systems, Inc., USA) was used for stimulus delivery. Visual stimuli were projected onto a transparent screen and subjects looked at the screen via a mirror attached to the head coil. Auditory stimuli were delivered to the subjects through MRI-compatible headphones (NordicNeuroLab AS, Bergen, Norway). Responses were recorded using an MRI-compatible response interface (NordicNeuroLab AS).

MRI scanning

Subjects were scanned using a 1.5-T MRI scanner (Siemens Magnetom Avanto, Siemens AG, Germany). Before the functional tasks, a 3D Magnetization Prepared Rapid Acquisition GRE T1-weighted image was acquired for anatomical reference (TR 1,980 ms, TE 3.09 ms, flip angle 15°, matrix 256 × 256, 176 contiguous sagittal slices with isotropic 1 mm3 voxels). All functional images consisted of 36 contiguous slices with a voxel size of 3 × 3 × 3 mm3 acquired parallel to the AC–PC plane with a gradient-echo echo-planar imaging sequence (TE 50 ms, flip angle 90º, matrix 64 × 64). Four additional scans were acquired before each task to ensure stability of the image intensity level. These additional scans were excluded from the analysis. The total scanning time, not including patient positioning, pauses between the tasks, or instructions, was approximately 50 min.

Data analysis

FMRI data was analyzed with the SPM5 software package (the Wellcome Trust Center for Neuroimaging, UCL, London, UK, www.fil.ion.ucl.ac.uk/spm/) running under Matlab R2007a (The Mathworks Inc., Natick, MA, USA). Preprocessing of the images included motion correction, correction of the acquiring time difference between slices, normalization to the standard ICBM152-space, and finally, spatial smoothing with a Gaussian kernel of 6 mm FWHM. Statistical analysis was performed on a voxel-by-voxel basis using the general linear model [24]. Data of all tasks were entered into the same design as separate sessions to enable analyses of the different task combinations. Nine statistical analyses were performed for each subject, all comparing the active condition and the control condition with a t test: one analysis for each language task separately, combination analyses of the visual and auditory tasks, and a combination analysis of all tasks. Finally, based on the results of the single task analyses, an additional combination analysis of the WGEN, RNAM, and SCOMP tasks was performed.

It has been shown that LI corresponds better with the Wada test if only language areas are used in the LI calculation [25, 26]. Therefore, the following regions of interest (ROIs) were defined: (1) Broca ROI, consisting of Brodmann’s area (BA) 44, BA45, and BA47; (2) Wernicke ROI, consisting of BA21, BA22, BA37, BA39, and BA40; and (3) combined ROI, including both the Broca ROI and the Wernicke ROI complemented with BA46, Heschl gyrus, and the hippocampus. The ROIs were defined using the atlases in WFU PickAtlas [27, 28].

To explore the activation power and laterality of the activation between the analyses in more detail, two scatter plot analyses were performed. First, the sum of positive T values within the combined ROI was calculated for both the left and right hemispheres for each subject. These sums were further normalized to the maximum of the sums over the subjects and analyses to enable the cross-analysis comparison. The normalized sums of positive T values represent the intensity of the activations. The left hemisphere normalized sums were then plotted against the corresponding right hemisphere sums and group mean and standard deviations were calculated for each analysis. Second, to assess the extent of the activations within the combined ROI, the number of activated voxels (p < 0.005, uncorrected) was calculated for the left and right hemisphere. Again, the numbers of activated voxels on the left hemisphere were plotted against the numbers of activated voxels on the right hemisphere and group mean and standard deviations were calculated. In both of the scatter plot analyses, the farther off the dot is from the lower left corner, the stronger the activation has been. If the dot is located near the R = L line, there has been activation on both the left and right hemisphere, whereas if the dot is near either y- or x-axis, the activation has been concentrated mostly on only one of the hemispheres.

The laterality index was calculated as LI = [(LR)/(L + R)], where L and R are the number of voxels surviving the threshold in these ROIs on the left and the right hemisphere, respectively. The threshold was defined for each subject individually as 80% of the maximum T value of the combined ROI [26]. Language dominance was defined based on the LIs, with LI > 0.1 indicating left hemisphere dominance, LI < −0.1 indicating right hemisphere dominance, and −0.1 ≤ LI ≤ 0.1 indicating bilateral language dominance.

Visual assessment of language lateralization

The individual fMRI results for each task and the different task combinations were further evaluated visually by an experienced neuroradiologist blinded to the calculated LI results. The activation threshold was chosen for each subject and analysis individually excluding clearly false positives (e.g., in the midbrain or near the eyes) but including logical activations. Language lateralization was classified separately for the whole brain, Wernicke’s area, and Broca’s area using a five-step dominance scale: definitely left, probably left, bilateral or questionable, probably right, and definitely right dominance. If there were not enough active voxels or there were some other uncertainties in the activations, then laterality was not assessed. The observer was told that the proportion of the non-right-handed individuals was notably higher than in a normal population.

Results

Activation levels and laterality

The overall fMRI activation level varied not only between subjects, but also between tasks and analyses within subject. Figure 1 illustrates the group results of each task using the threshold of p < 0.05 (false discovery rate (FDR) corrected) and corresponding scatter plots of the sums of the T values and of the number of activated voxels (Fig. 1a, WGEN; Fig. 1b, RNAM; Fig. 1, c SCOMP; Fig. 1d, LET; Fig. 1e, WP). The mean and standard deviations of both of the sums of the T values and number of activated voxels are illustrated with an ellipse of radii of 2 × SD in each scatter plot in Fig. 1 and presented in Table 3. The standard deviation along the L = R line (SDact) represents the variation in total activation power and the standard deviation perpendicular to that (SDlat) represents the variation in laterality. When the tasks were examined separately, the RNAM produced the strongest activation (highest group mean in terms of both intensity and extent). The SDact was largest in the WP thus reducing the robustness of its activation power. The variation in laterality was smallest in the WGEN task and the SCOMP task had the smallest variation in total activation power. To be able to select tests to the optimal task battery, we used the criteria proposed by Binder [14]. Three tasks were chosen based on the robustness of the activation power and laterality, and capability to activate the language network as widely as possible: WGEN, RNAM, and SCOMP. Among the visual tasks, RNAM and WGEN produced both strong activations with clear left hemisphere lateralization. The LET task was excluded because of its poor activation power. The SCOMP task was selected as the auditory task since it produced much stronger activation with less variation than the WP task.

Fig. 1
figure 1

Group results of single tasks. a Word generation (WGEN), b responsive naming (RNAM), c sentence comprehension (SCOMP), d letter task (LET), and e word pair (WP). The results are presented using the threshold of p < 0.05 (FDR corrected). The scatter plots on the left are calculated as normalized sums of positive T values and on the right as number of activated voxels (p < 0.005, uncorrected) within the combined ROI for each task. The standard deviation ellipses visualize the group means (central point) and variations in activation power (2 × SD on major axis along the L = R line) and in lateralization (2 × SD on the other major axis, perpendicular to the L = R line). The non-right-handed subjects are highlighted in blue in the scatter plots. The red dot in the scatter plots highlights a subject with highly variable laterality according to the LI and visual inspection (subject #14)

Table 3 The distributions of the activations on the left and right hemisphere

After comparing the five individual tasks, we performed comparisons between different combinations of tasks: (a) all tasks, (b) visual tasks, (c) auditory tasks, and (d) an optimal combination of visual and auditory tasks, i.e., WGEN, RNAM and SCOMP. Group results of activated areas in all four combination analyses are illustrated in Fig. 2 using the threshold of p < 0.05 (FDR corrected) along the similar scatter plots as for the single task analyses. The mean and standard deviations of the sums of the T values and of the number of activated voxels are illustrated with an ellipse of radii of 2 × SD in each scatter plot in Fig. 2 and further presented in Table 3. At the group level, the combination of all five tasks (Fig. 2a) and the combination of WGEN, RNAM, and SCOMP (Fig. 2d) produced almost identical results in terms of activated areas but the standard deviations SDlat and SDact were smaller in the combination of WGEN, RNAM, and SCOMP.

Fig. 2
figure 2

Group results of combination analyses. a All tasks in the same analysis, b visual tasks (WGEN, LET, RNAM), c auditory tasks (SCOMP, WP), and d combination of WGEN, RNAM, and SCOMP. The results are presented using the threshold of p < 0.05 (FDR corrected). The scatter plots on the left are calculated as normalized sums of positive T values and on the right as number of activated voxels (p < 0.005, uncorrected) within the combined ROI for each task. The standard deviation ellipses visualize the group means (central point) and variations in activation power (2 × SD on major axis along the L = R line) and in lateralization (2 × SD on the other major axis, perpendicular to the L = R line). The non-right-handed subjects are highlighted in blue in the scatter plots. The red dot in the scatter plots highlights a subject with highly variable laterality according to the LI and visual inspection (subject #14)

Laterality indices

The LIs for each subject and for each analysis are presented in Tables 4 and 5. The subjects are assorted in the tables based on the sum of the LIs so that the subjects with the most congruent left hemisphere dominance are presented first. Only 12 of the subjects (two left-handed, one ambidextrous) were classified as left hemisphere dominant in all analyses and ROIs. For the rest of the subjects (n = 8, 1 ambidextrous), the LI varied from clear left hemisphere dominance to bilateral to clear right hemisphere dominance depending on the task, the analysis, and the ROI. Figure 3 illustrates especially intense hemispheric and regional variation between tasks in a single subject (subject #14).

Table 4 Calculated laterality indices and visual assessments for combination analysis of all tasks; visual tasks; auditory tasks; and combined WGEN, RNAM, and SCOMP for all subjects using three different ROI combinations: combination ROI, Wernicke ROI, and Broca ROI
Table 5 Calculated laterality indices and visual assessments for each task and subject using three different ROI combinations: combination ROI, Wernicke ROI, and Broca ROI
Fig. 3
figure 3

An illustrative case of large variation and contradictory results in activations and language laterality between tasks in a single subject (subject #14). The general activation level varied between the tasks too much to use a single threshold for all tasks. Therefore, the threshold was set separately for each task by the neuroradiologist, based on whole brain activation maps

The RNAM task produced clear left hemisphere dominance for 19 of 20 subjects, all ROIs included, while the largest deviation in language dominance between subjects was found in the LET task. When the LI was calculated based on the Wernicke ROI only, 13 subjects were classified as left hemisphere dominant in all analyses; if the analysis was done using the Broca ROI only, 15 subjects were left hemisphere dominant. Only one subject had exactly the same LI (LI = 1.00) in all analyses and ROIs (subject #19).

Visually assessed language lateralization

The language lateralization assessments conducted by a neuroradiologist are presented in Tables 4 and 5, along with the calculated LIs. The observer’s confidence on interpretation of laterality was greater when assessing the laterality jointly from the Broca and Wernicke areas, when compared to the separate analyses of these functional areas. When the five individual tasks were assessed separately, the visual interpretation provided variable and even contradictory results (Tables 4 and 5). However, when assessing the combinations of tasks the interpretation of lateralization became more congruent and the observer’s confidence on the results increased. When the ROI-based LI calculations and the visual assessments of the combinations of tasks were compared, the results were contradictory in one individual.

Discussion

This study compared five different language tasks to determine hemispheric language dominance and overall activation power, and to establish the most suitable task combination for clinical use. The tasks were compared using the criteria proposed in a recent review article [14]. According to these criteria, the task should (1) lateralize to the left hemisphere in healthy, right-handed adults, (2) create robust activation, (3) be in concordance with the lateralization measured with other techniques, such as the Wada test, and (4) produce activations in particular target area depending on the surgical target. The purpose of this study was to find the most optimal combination of tasks to be used in defining the language lateralization in clinical patients. Since the total MRI scanning time of these five tasks was relatively long (50 min), we wanted to limit the number of tasks with healthy volunteers before applying them to patients. The use of healthy volunteers naturally limits the applicability of the above mentioned third criterion, i.e., the comparison of the result with other techniques such as the Wada test. Therefore, the selection of the tasks for the optimal task battery was performed using the three remaining criteria. In the present study, a combination of WGEN, RNAM, and SCOMP was chosen to form the optimal task battery. The combination includes both visual and auditory tasks and the single tasks activate different parts of the language network. In the present study, they all produced relatively robust and strong activations and the pattern of the activation lateralized to the left hemisphere both in the right-handed subjects and the total study population.

The first criterion considers the use of healthy right-handed volunteers. In our study, the majority of the subjects (n = 16) were right-handed, but four non-right-handed subjects were studied as well originally to be able to test heterogeneous results and to reflect clinical reality. All these four non-right-handed subjects, however, showed clear left hemisphere dominance and they were therefore included in the group analyses.

According to the fourth criterion, the fMRI task should produce activations in the surgical target area. Since the surgical target area varies between the patients, a combination of several different tasks activating different aspects of the language network should be used. Furthermore, it has been shown previously that a combined analysis of several tasks produces more reliable results in language lateralization than only single tasks [6, 8, 10, 11] and the use of both auditory and visual stimuli has been shown to produce less LI variability [29]. Our results agree with these prior findings. There was less LI variation between and within subjects when combined analyses of tasks were used than with single tasks. Moreover, using both auditory and visual tasks in the combination further decreased variability. The clinical observer’s confidence on the visual assessment of lateralization improved when a combination of tasks was used instead of individual tasks. Although the variation measured with SDlat and SDact was in general slightly higher in the combination analyses than in single task analyses, the overall activation power was higher in the combination analyses thus favoring the use of combination analyses. However, it should be noted that when tasks with relatively poor activation power (LET, WP) were included in the combined analysis, the variability between subjects and within subjects increased.

Since the laterality index is a common method to define the laterality, we wanted to see how much the LI varies depending on the analysis. There are several ways to calculate the LI [30, 31]. It has been shown that the LI depends on the threshold, as higher thresholds produce higher LIs [7, 8]. However, too high a threshold limits the number of voxels in the LI calculation, possibly biasing the results. Furthermore, the same threshold for all subjects is unreliable unless a range of thresholds is used, hence the use of an LI curve has been proposed [32, 33]. Moreover, since the LI curve may not always be easy to interpret, a subject-specific threshold has been proposed instead of a fixed threshold or an LI curve, especially in clinical work [26]. Therefore, all LIs in this study were calculated based on individually set thresholds defined as 80% of the maximum T value within the combined ROI. This method is simple, and takes into account the individual variation in overall activation level. It also balances the number of voxels used in LI calculation between subjects, enabling the comparison of different tasks and analysis methods.

However, individually adjusted thresholds as used in the LI calculation are not suitable in comparing the activation power of the different analyses between subjects. Therefore, a lower, fixed threshold (p < 0.005, uncorrected) was chosen to define the extent of activation i.e. number of activated voxels. In the scatter plot analysis, to define the intensity of the activation, we chose to use the sum of positive T values. Inclusion of all the positive values in the analysis takes into account the variation in the overall activation level between the subjects while also providing information of the variation between the hemispheres. It should be noted, however, that although different thresholds were used in this article, they all produce the same conclusion in selecting the optimal task battery.

Comparison of the visually assessed language lateralization and the calculated LI was performed to emphasize the importance of using a combination of several tasks. In the clinics, the fMRI results are often analyzed only visually mainly because visual assessment is usually faster than complex quantitative analysis. Results of the visual assessments based on computerized combination analyses were more congruent than those based on single tasks. Visual assessment of the activations’ laterality cannot be omitted, as it can interpret possible atypical language organization in cases of large variability in LIs, regionally and between tasks. The variable and sometimes contradictory results in visual assessment of the individual tasks advocate combination analyses of multiple tasks for clinical use.

The larger variation in activation between subjects in auditory tasks may partly result from the tasks’ control condition, which has been shown to have an effect on LI [34]. In general, it is recommended that the control condition should be as similar as possible to the active condition but with no language processing. However, the tones used in the control conditions in this study were acoustically simpler than words. For some subjects, this situation may have left some primary auditory activation visible in the results, increasing bilateral activity in the temporal lobe. Some subjects may also have utilized verbal strategies in the control condition. Another possible cause for the large variability in the LIs in auditory tasks is possible misalignment of the atlas-based ROIs and normalized anatomy. To be able to use predefined ROIs, the individual data must be normalized to the standard space. The normalization procedure does not always guarantee complete alignment of every anatomical region. Therefore, the voxels actually belonging to the primary auditory cortex may have been falsely taken into calculation.

The largest and inevitable limitation of our study is that we cannot validate the laterality provided by the fMRI using other techniques such as the Wada test because we used healthy volunteers. Therefore, our next goal is to repeat the study with Wada-tested patients selected for an operative treatment.

Conclusions

We have compared five fMRI language paradigms to choose an optimal combination for clinical use. The combination of WGEN, RNAM, and SCOMP tasks seems optimal in terms of activation power, robustness to detect essential language areas, and scanning time. Furthermore, we have shown that LI is highly dependent on the selected language task, and whether it is calculated based on a single task or on a combination of several tasks. For clinical purposes, the use of both auditory and visual stimuli should be advocated, since using only one type of stimulus may bias the results. However, only tasks that are proven to activate the cognitive network of interest should be included in the combination analysis. Determining language lateralization is often challenging and both single tasks and combinations of tasks should be used to arrive at a comprehensive understanding of individual cortical language areas.