Introduction

Fluent speech production is a critical aspect of real-world language processing and a clinically relevant aspect of language deficits following stroke. The focus on fluency in aphasia has a long history and continues to be central to aphasia diagnosis and treatment. Common clinical measures of fluency focus on functional aspects of producing connected speech, such as spontaneous-speech rate (words per minute), average length of utterance, and the fluency subscore of the Western Aphasia Battery (WAB; Kertesz, 1982), a clinical rating that integrates across grammatical, lexical, and speech-motor aspects of connected speech. Indeed, it has long been recognized that fluent production of connected speech is not “one thing”; rather, it is a consequence of effective coordination of multiple subsystems. For example, in primary progressive aphasia (PPA), speech rate (words per minute) is correlated with, but dissociable from, grammatical deficits (Thompson et al., 2012). Recent large-scale studies of individuals with post-stroke aphasia have combined principal components analysis with lesion-symptom mapping (LSM) and found two main subcomponents of fluency that are dissociable both behaviorally and neuroanatomically: segment-level (phonological/phonetic) speech production deficits and sentence- or utterance-level speech production deficits (Halai, Woollams, & Lambon Ralph, 2017; Lacey, Skipper-Kallal, Xing, Fama, & Turkeltaub, 2017). These studies identified a frontoparietal speech production system in which sentence-level deficits were associated with frontal damage and segment-level deficits associated with damage in inferior parietal damage (for a review see Mirman & Thye, 2018).

In a detailed LSM analysis of connected speech deficits in PPA, Wilson et al. (2010) found that degeneration of frontal regions was associated with speech sound distortions (phonetic errors) and syntactic deficits, whereas phonological errors were associated with posterior temporal degeneration. The anterior locus for motor speech deficits converges with LSM evidence from apraxia of speech (AOS), a disorder of articulatory planning and programming that is distinguished primarily by speech distortions and effortful articulation but also features substitution and addition of well-formed segments (phonological errors). Two recent LSM studies found an association between AOS and damage to precentral and postcentral gyri (Basilakos, Rorden, Bonilha, Moser, & Fridriksson, 2015; Itabashi et al., 2016). Notably, Wilson et al.’s PPA study and the apraxia of speech studies quantified segmental (phonological/phonetic) errors in the context of connected speech (e.g., picture description). In contrast, LSM studies of phonological errors during single-word production (picture naming) in post-stroke aphasia have found that these errors are associated with damage to inferior parietal (supramarginal gyrus) and somatosensory (postcentral gyrus) regions (Dell, Schwartz, Nozari, Faseyitan, & Coslett, 2013; Mirman, Chen, et al., 2015; Mirman, Zhang, Wang, Coslett, & Schwartz, 2015b; Schwartz, Faseyitan, Kim, & Coslett, 2012).

The picture that emerges is that speech distortion errors localize to regions (frontal and/or central) that lie anterior to the regions associated with phonological errors (anterior parietal and/or posterior temporal). This difference may reflect different stages of articulatory planning, as would be predicted by neurocomputational models of speech production, such as the Hierarchical State Feedback Model (Hickok, 2012; see also Walker & Hickok, 2015) and the DIVA model (Bohland, Bullock, & Guenther, 2010; Tourville & Guenther, 2011). These models localize segment-level articulatory planning closer to the central sulcus compared with higher levels of planning, which consist of more posterior auditory syllable and lexical target components (posterior superior temporal gyrus) and more anterior motor syllable planning (inferior frontal gyrus). However, the existing data may also reflect task differences, because the speech distortions were assessed in the context of connected speech, whereas the phoneme substitutions were assessed in a single-word production task. To address this, the present study examined segment-level articulatory planning and execution errors in the context of single word production (i.e., without the need to produce connected speech) and with reduced lexical-semantic demands (a word repetition task).

LSM studies of sentence-level speech production deficits have similarly used clinical measures, such as WAB Fluency, mean length of utterance, and words per minute (Basilakos et al., 2014; Catani et al., 2013; Rogalski et al., 2011) or composite scores that include multiple aspects of sentence production (den Ouden et al., 2019; Mandelli et al., 2014; Rogalski et al., 2011; Wilson et al., 2010). These studies consistently implicated damage to frontal regions (particularly the inferior frontal gyrus) and the underlying white matter (particularly the anterior segment of the arcuate fasciculus, the superior longitudinal fasciculus, and the frontal aslant tract) as the key neural correlates of sentence-level production deficits. However, as a result of using clinical and composite measures, these studies did not make some important psycholinguistic distinctions. First, measures, such as utterance length and speech rate (words per minute), are dependent on articulatory agility as well as sentence planning, so they integrate rather than distinguish between articulatory-motor and sentence planning deficits. Second, these measures do not distinguish between structural aspects of sentence planning (i.e., converting a holistic message representation into a sequence of words) and morphosyntactic aspects (i.e., appropriate use of function or closed-class words and bound grammatical morphemes). Structural and morphological deficits have been dissociated behaviorally, suggesting that they also are supported by distinct cognitive processes or systems (Rochon, Saffran, Berndt, & Schwartz, 2000), although (to our knowledge) the neural correlates of this distinction have not been examined with lesion-symptom mapping.

To address this gap, the present study examined two measures of grammatical processing in sentence production that distinguish between structural aspects of sentence production (proportion of words in sentences) and morphological aspects (proportion of closed class words). These measures also minimize the influence of working and short-term memory and the influence of phonological or articulatory factors, because these measures are neither dependent on production of long sentences nor on a fast speech rate. This reduction of WM/STM and articulatory agility demands allows us to better isolate sentence-level planning processes, which should rely on mid-anterior frontal regions if they are similar to other types of sequential action planning (Botvinick, 2008) and/or temporo-parietal regions if they are similar to other types of event or thematic processing (Bedny, Dravida, & Saxe, 2014; Mirman, Landrigan, & Britt, 2017; Thothathiri, Kimberg, & Schwartz, 2012).

The present study had as its major goal to produce a more focused and theoretically relevant characterization of the subcomponents of fluency than has been done to date. To this end, we examined the neural correlates of both segment-level and sentence-level speech production deficits using refined measures that minimize the influence of other processes, are rooted in prior theoretical, computational, and behavioral research, and have proven utility for producing clinical dissociations (Galluzzi, Bureca, Guariglia, & Romani, 2015; Rochon et al., 2000).

The second goal of the present study was to reevaluate prior LSM research on fluency deficits using new behavioral measures and LSM methods. There has been growing concern across the behavioral and biological sciences about increasing rigor and reproducibility, but direct replication of a LSM study, which would require large-scale behavioral and neuroimaging testing of people with acquired language (or other cognitive) deficits, is not feasible for practical or financial reasons. Insofar as the present results converge with prior studies, they provide critical converging evidence across laboratories that helps to establish which patterns are robust enough to emerge in different participant samples and with somewhat different measures and methods. Further, instead of standard mass-univariate voxel-based lesion-symptom mapping, we used voxel-level support vector regression lesion-symptom mapping (SVR-LSM) (Zhang, Kimberg, Coslett, Schwartz, & Wang, 2014). SVR-LSM is a multivariate lesion-symptom mapping method that is particularly well-suited to studying symptoms that may have multiple causes, because it can detect independent contributions from distinct brain regions.

In sum, we applied multivariate LSM methods to precisely defined behavioral deficits at the segment-level and sentence-level of production to provide a more precise picture of the neural system that is required for fluent language production.

Methods

Participants and Lesion Data

The data were drawn from an ongoing large-scale study of language processing following left hemisphere stroke. Analyses of other language deficits in earlier subsets of the participants have been reported in several previous articles (Chen, Middleton, & Mirman, 2018; Mirman, Zhang, et al., 2015; Schwartz et al., 2009, 2012; Thothathiri et al., 2012), which also provide more detailed descriptions of the participants and imaging methods. The study was performed in accordance with protocols approved by the Institutional Review Boards at the Einstein Healthcare Network and University of Pennsylvania School of Medicine. The participants were survivors of left hemisphere stroke (not bilateral or solely subcortical) who had active aphasia or had recovered from clinically significant aphasia but continued to report language deficits. All had English as their first language, were right-handed before stroke, were able to produce at least one correct response on the Philadelphia Naming Test (PNT; Roach, Schwartz, Martin, Grewal, & Brecher, 1996), and passed a pure tone audiometric screening to verify sufficient hearing to perform the word repetition task. Participants were tested outside the acute phase, at least 2 months post onset, with almost all tested in the chronic phaseFootnote 1. The sample included a wide range of aphasia sub-types, aphasia severity based on the WAB Aphasia Quotient, and WAB Fluency sub-scores. A subset had co-occurring apraxia of speech (AOS), based on the Apraxia Battery for Adults test (Dabul, 2000). Apraxia of speech is considered a disorder of high-level speech motor programming, affecting fluency at the phonetic level of speech production. Participants with significant peripheral dysarthria were excluded from this study in order to be able to rule out peripheral speech-motor control causes of fluency deficits. Table 1 provides further clinical and demographic details.

Table 1 Background information about participants in the main analyses

Lesion location was assessed based on MRI or CT brain scans, following the same procedures as previous studies of this data set (or sub-sets of these data). For the MRI scans, lesions were manually segmented on each participant’s T1-weighted structural image, then the structural scans and lesion maps were registered to the Montreal Neurological Institute space Colin27 template by an automated process (Avants, Schoenemann, & Gee, 2006). For the CT scans, the lesion was drawn directly onto the Colin27 template after rotating it (pitch only) to match the approximate slice plane of the participant’s scan. Both the coding of the behavioral data and the lesion drawing were done by trained individuals who were blind to the hypotheses tested here. Lesion coverage was good throughout the left MCA territory, particularly the dorsal speech production system structures of the frontal lobe and inferior parietal lobe (Figure 1).

Fig. 1
figure 1

Top: Lesion overlap for all 115 participants in articulatory deficits analyses, thresholded to include only voxels that were lesioned in at least 10 participants (max = 65 participants). Bottom: Lesion overlap for all 46 participants in grammatical deficits analyses, thresholded to include only voxels that were lesioned in at least 5 participants (max = 33 participants)

Quantifying Articulatory Deficits

To quantify segment-level articulatory difficulties under minimally demanding speaking conditions, we measured phonetic errors in single word repetition. A similar measure was previously used to identify articulatory impairments that have consequences for phonological error production (Galluzzi et al., 2015; Romani & Galluzzi, 2005; Romani, Olson, Semenza, & Granà, 2002). These studies, conducted with Italian-speaking individuals with aphasia, analyzed the phonological errors of participants who scored low or high on the articulatory-deficit measure and found that only the articulatory group showed a consistent bias toward CV-structure simplification. They concluded that there are two distinct loci for phonological error production: articulatory deficit and failed lexical-phonological retrieval (for a related account see Goldrick & Rapp, 2007). Phonetic errors are a key hallmark of apraxia of speech, but more traditional measures of apraxia of speech impose additional demands (such as complex sequential processing), so phonetic errors in word repetition provides a more focused characterization of articulatory deficits. This background establishes PEP as a measure of articulatory deficit that is relevant to both linguistic theory and clinical classification.

Using participants’ data from the 175-item Philadelphia Repetition Test (PRT) (Dell, Martin, & Schwartz, 2007), we identified responses containing phonetic errors, defined as initial sound struggle, intrusive schwa, impaired prosody, episodic issues with nasal resonance, distorted articulation, or sound combinations that violated English phonotactics. Such phonetic errors were counted regardless of whether they occurred in the context of a correct target word attempt or an incorrect response. The following were not counted as phonetic errors: (1) dialectal or regional accent influences that create stretched vowels or shortened diphthongs; (2) responses spoken with upward intonation, as if to confirm what was heard; (3) undistorted sound substitutions (in order to isolate articulatory distortion from phonological errors). For each participant, the number of target words repeated with one or more phonetic errors was divided by the total number of target words (n = 175) to arrive at a phonetic error proportion (PEP).

Two speech pathologists were trained by jointly coding all 175 PRT items from 10 participants. For each participant, the coders began by listening to the first 10-20 responses to get an idea of the speaker’s individual speaking characteristics. Following training, intercoder reliability was assessed using 30 participant PRTs. An independent investigator who was not involved in the coding selected a subset of 35 responses (20%) from each of 30 participant PRTs. Each of the two speech pathologists applied the phonetic coding scheme independently to the same 30 subsets of 35 responses. Interrater agreement on this subset of 1,050 trials from 30 participants was 94.3%. Once coding reliability was established, the remaining responses from these 30 participants and full PRT responses for the remaining participants were divided between the two speech pathologists and independently coded.

Quantifying Grammatical Deficits

Our measures of grammatical deficits derived from the Quantitative Production Analysis (QPA; Rochon et al., 2000; Saffran, Berndt, & Schwartz, 1989), a standardized procedure for eliciting and coding narrative speech production in aphasia. The narratives were produced under instructions to retell the story of “Cinderella” or another familiar fairy tale, resulting in a narrative speech sample that included at least 150 words after removal of repetitions and filled pauses. The speech samples were transcribed and coded by a speech pathologist or research assistant specifically trained to perform transcription and QPA coding following guidelines based on published QPA procedures, adapted to accommodate fluent as well as nonfluent participants. During training, coders achieved approximately 90% agreement on their transcription and coding of utterance boundaries, utterance content, and grammatical structure. Each narrative was segmented to highlight word groupings that constitute a sentence (minimally, a noun/pronoun followed by a verb) versus unstructured groupings and isolated words. Multiple grammatical measures were derived, from which we selected two that capture different aspects of grammatical processing while minimizing the influence of articulatory agility and WM/STM demands. Both measures were proportions of total words produced, thus also controlling for the total number of words produced by the participant. In the QPA coding system, only the final occurrence of a repeated word is included (except where repetition is used for emphasis), so these measures should not be distorted by word repetition or perseveration (see Appendix A, section III.G of Saffran et al., 1989).

The first measure was proportion of words in the sample that were closed class, that is, function words, such as pronouns, determiners, conjunctions, and prepositions. Along with bound suffixes, such as plural /-s/ and past tense /-d/, closed class words are morpho-syntactic elements of grammar, deriving function and meaning from the surrounding open class content (nouns and verbs). We did not measure production of bound morphemes, because these are relatively difficult to produce and so may be omitted either as a result of grammatical deficits or as a result of articulatory deficits. Closed class words are less difficult to articulate, so we can be more confident that their omission is tied to grammatical deficits.

The second measure of grammatical deficits was proportion of words falling within sentence boundaries. This measure captures each individual’s ability to produce the absolute minimum of English sentence structure—a noun and a verb—without requiring the individual to produce long or complex sentences. Thus, proportion of words in sentences provides a measure of the structural aspects of grammar that minimizes WM/STM constraints. Other measures of sentence structure, such as production of sentences with embedded clauses, require that participants produce relatively long sentences. Such long and complex sentences require working memory and short-term memory to maintain the sentence structure and articulatory ability to produce a long utterance. In contrast, producing a minimal two-word noun-verb sentence minimizes those demands. To further isolate the contribution of sentence structure deficits, follow-up analyses controlled for effects of syntactic processing in comprehension (reversible sentence comprehension: Thothathiri et al., 2012), single word retrieval (picture naming accuracy from the PNT), and articulatory ability (PEP).

Support vector regression lesion-symptom mapping

Lesion-symptom mapping analyses were performed using support vector regression (SVR-LSM) (Zhang et al., 2014). SVR-LSM leverages a multivariate machine learning algorithm to discover lesion-behavior relationships. Compared with standard mass-univariate voxel-based lesion-symptom mapping methods, SVR-LSM is better able to capture independent contributions of multiple brain regions to performance and is less sensitive to differences in statistical power that arise from differences in proportion of participants with lesions in each voxel. These advantages are particularly important for the present study, because fluency appears to depend on multiple cognitive processes that may have distinct neural bases. As a standard pre-processing step for SVR-LSM, each participant’s voxel-wise lesion vector was normalized by dividing each voxel’s binary lesion status value by the square root of the total lesion volume. This also serves as a control for the impact of lesion volume, referred to as “direct total lesion volume control” (Zhang et al., 2014). SVR-LSM requires setting two free parameters (cost and gamma), which must be done in a principled way that is independent of the researchers’ hypotheses. For each of the three main analyses, these parameters were selected based on 5-fold cross-validation to maximize prediction accuracy (as recommended in Zhang et al., 2014). Follow-up analyses used the same parameters as the corresponding main analysis.

In addition to controlling for lesion volume, the SVR-LSM analyses only included voxels with sufficient lesion involvement (Sperber & Karnath, 2017). The minimum lesion involvement recommended by Sperber and Karnath is 5% of the overall sample, though many studies use a 10% threshold. For the articulatory deficit (PEP) analysis (N = 115), only voxels where at least ten participants had lesions were included in the analysis. For the grammatical deficits analyses, the sample was substantially smaller (N = 46), so the minimum lesion involvement was reduced to five participants. SVR-LSM produces a voxel-wise map of raw regression β values. Statistical significance for the β values was calculated using a permutation test (2000 permutations) and corrected at false discovery rate (FDR; Genovese, Lazar, & Nichols, 2002; Zhang et al., 2014), q < 0.05. The final results include only voxels that passed the FDR threshold, were in the top 5% of raw β values, and comprised clusters larger than 50 voxels.Footnote 2

Results

Articulatory Deficits: PEP

Phonetic error proportion (PEP) was the primary behavioral measure of articulatory deficits, with M = 0.10, standard deviation (SD) = 0.12, and range = 0.00–0.545 (the full distribution of this measure is shown in the left panel of Figure 2). PEP was fairly strongly correlated with aphasia severity (WAB AQ: r = −0.532, p < 0.0001) and WAB Fluency scores (r = −0.508, p < 0.0001). This sample of 115 participants included 32 participants with AOS, as defined by the Dabul Apraxia of Speech Battery (Dabul, 2000). Because speech sound distortions weight heavily in the diagnosis of AOS, it is unsurprising that participants diagnosed with AOS had significantly higher PEP (logistic regression: estimate = 1.65, SE = 0.049, z = 33.61, p < 0.0001; AOS: M = 0.224, 95% confidence interval (CI) = 0.213-0.235; no AOS: M = 0.052, 95% CI = 0.049-0.056). Even after controlling for WAB AQ, PEP was significantly associated with WAB Fluency and AOS (both p < 0.05). There also was a fairly strong correlation between PEP scores and phonological errors, as measured by the proportion of nonwords on the Philadelphia Naming Test (r = 0.54, p < 0.0001). This, too, was expected, given the well-known co-occurrence of phonetic and phonological errors in AOS.Footnote 3

Fig. 2
figure 2

Distributions of behavioral scores for primary analyses. Left: Articulatory deficits score: phonetic error proportion (PEP). Middle: Grammatical deficits score: proportion of closed class words. Right: Grammatical deficits score: proportion of words in sentences

SVR-LSM (cost = 1, gamma = 1.5; prediction accuracy: r = 0.481, p < 0.001) of phonetic error proportion revealed that this measure of articulatory deficits was most strongly associated with damage to the mid-posterior portion of the dorsal speech stream (Figure 3A, red voxels): the postcentral gyrus and the inferior parietal lobule (primarily supramarginal gyrus). This result is very similar to a prior mass-univariate VLSM analysis of the tendency to produce phonological errors in picture naming (Schwartz et al., 2012). As noted above, we calculated phonological errors in the manner of the prior study (proportion nonword responses on the PNT) for the same 115 participants included in this PEP analysis and mapped the association between these errors and lesion location using SVR-LSM. The results revealed that the lesion correlates for phonetic errors in word repetition (PEP) and phonological errors in picture naming were almost perfectly overlapping (Figure 3A, blue voxels correspond to phonological errors in picture naming, purple voxels correspond to the overlap). SVR-LSM of PEP controlling for phonological error proportion left no voxels that survived FDR correction. That is, phonetic errors in word repetition and nonword errors in picture naming seem to have largely the same lesion correlates: the mid-posterior portion of the dorsal speech system. Two possible reasons, not mutually exclusive, are that many phonological errors may be phonetically inspired (Galluzzi et al., 2015; Romani & Galluzzi, 2005) or that the neural networks for the phonological and phonetic encoding of speech segments may overlap too strongly to be disentangled with present methods.

Fig. 3
figure 3

(A, top row) Results of SVR-LSM for phonetic error proportions in word repetition (red), nonword errors in picture naming (blue), and their overlap (purple). (B, bottom row) Results of SVR-LSM for proportion of words in sentences

Grammatical Deficits: Proportion of Closed Class Words

The overall mean proportion of words in a participant’s sample that were closed class was 0.48 (SD = 0.098, range = 0.18–0.61; the full distribution of this measure is shown in the middle panel of Figure 2). The proportion of closed class words was significantly correlated with aphasia severity (WAB AQ: r = 0.417, p < 0.01), WAB Fluency score (r = 0.497, p < 0.001), and words per minute during the narrative production task (r = 0.500, p < 0.001). After controlling for aphasia severity, proportion of closed class words was still significantly associated with WAB fluency and words per minute (both p < 0.05).

SVR-LSM (cost = 1, gamma = 1.5; prediction accuracy: r = 0.406, p < 0.01) of proportion of closed class words revealed no voxels that survived FDR correction, indicating that we were not able to detect a consistent lesion pattern that was associated with morphosyntactic deficits measured by production of closed class words.

Grammatical Deficits: Proportion of Words in Sentences

The overall mean proportion of words in a participant’s sample that were within sentence boundaries was 0.70, with variation between individuals covering essentially the full range (SD = 0.30, range = 0.03–1.0; the full distribution of this measure is shown in the right panel of Figure 2). The proportion of words in sentences was significantly correlated with aphasia severity (WAB AQ: r = 0.485, p < 0.001), WAB Fluency score (r = 0.571, p < 0.001), and words per minute during the narrative production task (r = 0.484, p < 0.001). After controlling for aphasia severity, proportion of words in sentences was still significantly associated with WAB fluency and words per minute (both p < 0.05).

Figure 3B shows the SVR-LSM (cost = 1, gamma = 1; prediction accuracy: r = 0.646, p < 0.001) results for proportion of words in sentences, which revealed that this measure of grammatical deficits was primarily associated with frontal lobe damage. In particular, reduced proportion of words in sentences was most strongly associated with damage to the middle frontal gyrus and inferior frontal gyrus (primarily pars triangularis, with substantial involvement of pars orbitalis, and very little extension into pars opercularis). There also was a smaller cluster of voxels in the postcentral gyrus and inferior parietal lobule.

Following prior reports that fluency deficits are associated with damage to frontal white matter (Basilakos et al., 2014; Bonilha & Fridriksson, 2009; Catani et al., 2013; Dronkers, Plaisant, Iba-Zizen, & Cabanis, 2007; Fridriksson, Guo, Fillmore, Holland, & Rorden, 2013; Halai et al., 2017; Wilson et al., 2010), we tested overlap with the frontal aslant tract and the anterior segment of the arcuate fasciculus but found only minimal overlap with these white matter tracts. The white matter tract regions were defined using the atlas developed by Catani and colleaguesFootnote 4 (Catani, Dell’Acqua, Bizzi, et al., 2012; Catani, Dell’Acqua, Vergani, et al., 2012; Catani & de Schotten, 2012) using a 50% threshold (i.e., regions that corresponded to those white matter tracts in at least 50% of neurologically typical participants). For both the anterior segment of the arcuate fasciculus and the frontal aslant tract, there was less than 5% overlap between the white matter tract and the region identified by SVR-LSM (calculated either as a proportion of the white matter tract or as a proportion of the SVR-LSM region).

As described above, we chose proportion of words in sentences as a measure of grammatical deficits because it reflects the ability to structure sentences and is minimally influenced by other cognitive and articulatory abilities. Nevertheless, articulatory and word retrieval deficits may contribute to reduced proportion of words in sentences. To address this, we conducted follow-up analyses of proportion of words in sentences controlling for PEP as a measure of articulatory deficits and controlling for picture naming accuracy (from the PNT) as a measure of word retrieval ability. The SVR-LSM results did not change substantially: after FDR correction, the remaining voxels were primarily in inferior and middle frontal gyri, with some extension in the precentral and postcentral gyri.

To distinguish sentence structure processing deficits in production from related deficits in comprehension, we conducted a follow-up analysis of proportion of words in sentences controlling for comprehension of reversible sentences. Reversible sentence comprehension scores came from a sentence-to-picture matching task (Breedin & Saffran, 1999) in which sentences, such as “The man serves the woman,” were presented with a picture of a man serving a woman (target) and a picture of a woman serving a man (distractor). Correct responses in this task require correctly assigning thematic roles in the sentences (i.e., identifying the actor/agent and the recipient/patient of the action). Prior studies have found that impaired comprehension of such sentences is associated with damage to the temporo-parietal cortex (Race, Ochfeld, Leigh, & Hillis, 2012; Thothathiri et al., 2012). Our follow-up analysis showed that controlling for comprehension of reversible sentences did not change the pattern of lesion correlates of proportion of words in sentence: the primary lesion correlates were still in the inferior and middle frontal gyri, although with greater extension into the superior frontal gyrus, and the precentral and postcentral gyri. The results of all of the SVR-LSM analyses are summarized in Table 2.

Table 2. Number of voxels in each region of interest for each analysis where any voxels survived FDR correction. All regions refer to left hemisphere only

Alternative Analyses

Because SVR-LSM is a relatively new analysis technique, best practices in its implementation are still under investigation. A recent reimplementation (DeMarco & Turkeltaub, 2018) offers alternative methods of lesion volume control and multiple comparisons correction. We reanalyzed the data using this implementation, controlling for lesion volume by regressing it out of both behavioral data and raw lesion data (“Regress on Both”) and correcting for multiple comparisons using cluster size correction based on 10,000 permutations. The results were broadly the same as reported above, with only slight differences in some of the follow-up analyses. PEP was associated with a significant cluster in the postcentral gyrus and inferior parietal lobule, which also was significant after controlling for phonological (nonword) errors in picture naming. Analysis of proportion of closed class words identified no significant clusters after correction. Proportion of words in sentences was associated with a significant cluster in the frontal lobe, which remained significant after controlling for picture naming accuracy (PNT) and reversible sentence comprehension, but not after controlling for PEP.

Cluster-level correction may not be a good strategy for SVR-LSM, because it tends to produce single over-large clusters (Mirman et al., 2018), which will tend to overestimate the contribution of a single region and underestimate the contributions of other regions. Therefore, we also recomputed FDR-corrected thresholds using voxel-wise p-values from the 10,000 permutations, but those results were substantially less conservative than our initial analyses. Finally, to eliminate incorrect estimation of p = 0 for some voxels in our original 2000-permutation analyses, we “regularized” those p-values (for a discussion of regularization see (Donnelly & Verkuilen, 2017)) by shifting them away from that floor level by either a fixed amount (0.5/2000) or by random sampling. This produced FDR thresholds that were virtually identical to the original calculation, suggesting that the presence of p = 0 values did not substantially distort the FDR calculation. In sum, alternative strategies for lesion volume correction and multiple comparisons correction produced substantively the same patterns of results, indicating that the observed patterns are relatively robust with respect to details of the analysis strategy.

Discussion

Fluent speech production requires the rapid coordination of a wide range of cognitive processes, including executive functions and working/short-term memory, grammatical knowledge, semantic memory, and articulatory planning and execution. There is broad agreement among prior studies that fluent speech production is supported by frontoparietal brain regions. The present study provides a finer-grain investigation of the subcomponents of this system by using behavioral measures of articulatory and grammatical deficits that minimize the influence of other cognitive processes and by using a multivariate lesion-symptom mapping method (SVR-LSM), which can more effectively detect contributions of multiple brain regions.

Articulatory deficit was measured by production of phonetic distortion errors (such as distorted articulation, intrusive schwa, impaired prosody, and violations of English phonotactics) produced in a word repetition task. The single word repetition task minimizes lexical-semantic and higher-level cognitive demands, and evaluating phonetic distortions allowed us to evaluate articulatory planning and execution aspects. Individuals with significant peripheral articulatory deficits (dysarthria) were excluded, so this measure effectively isolated central articulatory planning and execution deficits. The SVR-LSM results revealed that these deficits were associated with damage to the postcentral gyrus and the inferior parietal lobule, overlapping very strongly with lesion correlates of production of phonological errors in picture naming. This finding differs from other studies that suggested a more anterior locus of articulatory deficits in apraxia of speech (Basilakos et al., 2015; Itabashi et al., 2016) or based on assessments of errors in connected speech (e.g., Wilson et al., 2010). Assessing articulatory deficits in the context of connected speech may introduce frontal lobe sentence-level contributions to fluent speech production. Similarly, although apraxia of speech is characterized by articulatory deficits, it strongly co-occurs with non-fluent forms of aphasia, so clinical definitions of AOS do not isolate articulatory deficits. A recent study found that single-word repetition errors were associated with posterior peri-Sylvian damage, whereas articulatory deficits in spontaneous speech were associated anterior insula and inferior frontal damage (Ripamonti et al., 2018). That is, LSM results for articulatory deficits in prior studies may have been shifted anteriorly by contributions from other, co-occurring fluency deficits. By measuring articulatory deficits in a word repetition task, the present study minimized such effects and more precisely identified the postcentral and inferior parietal lesion correlates of articulatory deficits as a subcomponent of apraxia of speech.

Grammatical deficits were measured by proportion of closed class words (morpho-syntax) and proportion of words in sentences (sentence structure) produced in the context of a semistructured narrative (telling the Cinderella story). These measures minimize executive function and WM/STM demands, because they do not require production of long sentences or complex grammatical structures, and they minimize articulatory demands, because they do not require a fast speech rate or production of difficult bound morphemes. Thus, these measures effectively isolate morpho-syntactic and sentence structuring aspects of grammatical deficits. Proportion of closed class words did not have consistent lesion correlates. The cross-validation accuracy during SVR-LSM parameter optimization was statistically significant, indicating that there was some systematic relationship between lesion pattern and proportion of closed class words; however, no voxels survived FDR correction for multiple comparisons, so it is not clear which aspects of the lesion pattern were associated with this grammatical deficit measure. Proportion of words in sentences was strongly associated with frontal lobe damage, particularly the inferior and middle frontal gyri. This pattern remained after controlling for articulatory deficits, word retrieval ability, and reversible sentence comprehension. This pattern converges with a study of PPA that dissociated fluency from grammatical deficits and found that fluency deficits (MLU) were associated with middle frontal gyrus degeneration whereas grammatical deficits were associated with distributed frontoparietal degeneration (Rogalski et al., 2011).

Unlike some prior studies, we did not find that sentence-level fluency deficits were associated with frontal white matter damage. Two factors may have contributed to this difference between our results and previous studies. First, some of those prior studies used broader measures of fluent production such as mean (or median) length of utterance (Catani et al., 2013), WAB Fluency (Basilakos et al., 2014; Fridriksson et al., 2013), and a composite “speech quanta” score (Halai et al., 2017). We used proportion of words in sentences in order to more precisely isolate the role of sentence structure deficits from other articulatory and cognitive deficits. The prior findings that frontal white matter damage is associated with fluency deficits may reflect that impaired integration or coordination of multiple subsystems contributes to utterance lengths, WAB fluency scores, and other such broad measures of fluency. Second, the prior studies used standard, mass-univariate VLSM, which is susceptible to mislocalization of grey matter effects into the underlying white matter (Mah, Husain, Rees, & Nachev, 2014), though that mislocalization may be mitigated by common LSM best-practices (Sperber & Karnath, 2017). SVR-LSM is less susceptible to this artifact (Zhang et al., 2014), while maintaining sensitivity to effects of white matter damage (Griffis, Nenert, Allendorfer, & Szaflarski, 2017; Mirman, Zhang, et al., 2015).

Building on prior findings, the present results provide a clearer picture of the neural system that supports spoken language production and how it is impaired following left hemisphere stroke (Figure 4). This system has three main components. The first is sentence structuring: fluent narrative speech production requires the ability to structure grammatically complete descriptions of events or propositional statements involving, minimally, noun plus verb. This ability seems to critically involve the left inferior and middle frontal gyri and aligns with the neural basis of other types of sequential action planning (Botvinick, 2008). The second component is populating the sentence structure with actual words; that is, semantically driven word retrieval. This component was not examined in the present study, but the results of numerous previous studies converge to identify the anterior temporal lobe as the critical region for semantically driven word retrieval (Chen et al., 2018; Lambon Ralph, McClelland, Patterson, Galton, & Hodges, 2001; Mesulam et al., 2013; Schwartz et al., 2009; Walker et al., 2011), with possible additional roles for the uncinate fasciculus (Catani et al., 2013) and inferior frontal regions (Chen et al., 2018; Harvey & Schnur, 2015; Mirman & Graziano, 2013; Schnur et al., 2009). The third component is phonetic and phonological planning and execution of those words, which relies primarily on inferior parietal regions. Motor systems are, no doubt, also important for articulatory motor control, but the phonetic and phonological aspects of articulation appear to rely more heavily on the action planning systems of inferior parietal lobule and the postcentral gyrus system, presumably for online somatosensory and auditory monitoring of articulation (Hickok, 2012; Rauschecker, 2011; Rogalsky et al., 2015).

Fig. 4
figure 4

Diagram of the three neural components of the fluent speech production system: frontal sentence structure component (red), anterior temporal word retrieval component (green), and parietal phonological planning and execution component (blue)

Fluent speech production also requires working memory and related executive functions in order to formulate a coherent narrative, maintain it, and update the speaker’s place in that narrative. Working memory and executive functions can be impaired in aphasia (Halai et al., 2017; Lacey et al., 2017), which could produce a different sort of impairment in narrative speech production (and comprehension). In the present study, we have sought to identify distinct components of the speech production system so we used measures that had minimal demands on other components, but fluent speech production requires coordination among all of these components. Further, it is likely that these components interact during speech production. For example, articulatory deficits may influence lexical selection (speaker may prefer words that are easier to articulate), grammatical production (omission of bound morphemes), and sentence structure (speaker may prefer shorter sentences). Understanding of the system’s components should be complemented with understanding of how those components interact.

Clinical measures of fluent language production, such as the WAB fluency subscore, mean (or median) length of utterance, words per minute, etc., typically do not distinguish between different sources of nonfluent speech production. As a result, they can reflect deficits in any of these components or multiple components (Caplan, 2012). The present results show that these components have quite distinct neural substrates. In addition to providing new insights into the neural basis of spoken language production, these results suggest that diagnosis and treatment of post-stroke nonfluent language deficits should carefully consider whether the nonfluency is arising from deficits at grammatical, lexical-semantic, or phonological/phonetic levels. Both finer-grained behavioral assessment and consideration of precise lesion location can help to localize the deficit. Similarly, treatments that improve grammatical processing are unlikely to improve phonological retrieval and encoding, and vice versa. Therefore, treatment research and selection should distinguish between sources of nonfluency.